View profile

Quorum in Kubernetes and Swarm Pt. 2: Building for Fault Tolerance

Today I talk all about the different levels of fault tolerance for Raft, and how to design for high a
Quorum in Kubernetes and Swarm Pt. 2: Building for Fault Tolerance
By Bret Fisher • Issue #11 • View online
Today I talk all about the different levels of fault tolerance for Raft, and how to design for high availability. It’s the rule of three.

Building For Raft Fault Domains
Raft Fault Domains, in Marker Form
Raft Fault Domains, in Marker Form
Real-Time Analytics & Monitoring
Fault Tolerance takes many forms, because there are many fault domains (also called failure domains) to consider. When we’re discussing server cluster design, particularly in the cloud, we’re usually limiting the conversation to node fault tolerance (for your apps) and control plane fault tolerance (for the raft db and management API’s). We’ll assume the infrastructure below it, and the apps your running on top, are out of scope of this discussion (and need their own FT solutions).
Because Swarm has a built-in Raft log, and Kubernetes uses etcd as its datastore (which is also a Raft implementation), they both have the same rule: you need a minimum of three running copies before one can fail and the cluster still has quorum. So here’s a quick Raft consensus guide on the different levels of infrastructure fault domains.
Region Fault Tolerance: First let me get this out of the way. You should not design a single Swarm or Kubernetes cluster across cloud regions or across datacenters where the latency averages above 10ms and/or there is network translation (NAT) or external firewalls between nodes. None of the official Docker and Kubernetes design guides recommend this setup for many reasons that I could write a whole article on. To connect multiple clusters together into one management plane is known as federation, and it is something you should only consider after you’re really good at running the clusters by themselves. Federation is a topic for another day.
Zone Fault Tolerance: Also called Availability Zones, these are datacenters in the same city (usually) that have 10ms latency (or less) and don’t have NAT between them. All the major cloud providers do this, and the key here is you must use regions with at least three zones, and use exactly three zones for your control plane. If you only use two zones, then you can’t guarantee Raft quorum if one of the zones fails. I recommend drawing this out where you place your Raft nodes in different zones and then take a zone down. Are a majority of raft nodes still healthy? You’ll soon see you can’t do this with two or four zones. This same rule applies for racks and nodes. They have to be an odd number.
Rack Fault Tolerance: In many datacenters, the top-of-rack switch is often not fault tolerant, so you would take steps to ensure not all control plane nodes are in the same rack. This isn’t normally a concern in the cloud, but if you do control which rack nodes are in, then the same rules as Zones apply here. You ideally want the control plane nodes spread out across three racks.
Node Fault Tolerance: This follows the same rules as above, but is the one most talked about in blogs and documentation. For Raft to have consensus, a majority of control plane nodes (Kubernetes Masters or Swarm Managers) must be healthy and reachable. If you have three, then two must be healthy. If you have five, then three must be healthy. Spread these out across the three racks, and three zones so there’s at least one in each of the fault domains.
What You Might Have Missed...
Sweet Terminal and Shell Setups: DevOps and Docker Live Show (Ep 55)
Sweet Terminal and Shell Setups: DevOps and Docker Live Show (Ep 55)
Live Docker, Kubernetes, and Swarm Q&A: DevOps and Docker Live Show (Ep 56)
Live Docker, Kubernetes, and Swarm Q&A: DevOps and Docker Live Show (Ep 56)
Thanks for reading,
– Bret
Weekly YouTube Live: bretfisher.com/youtube
Course Coupons: bretfisher.com/courses
Did you enjoy this issue?
Bret Fisher

Frequent updates on my projects, videos, and opinions focused on the container ecosystem, including Docker, Kubernetes, Docker Swarm, CI/CD, and container DevOps.

If you don't want these updates anymore, please unsubscribe here
If you were forwarded this newsletter and you like it, you can subscribe here
Powered by Revue
Bret Fisher, Virginia Beach, Virginia, USA