The system uses etcd for storing all of its cluster data. It is an open source database that is excellent for maintaining the state of Kubernetes. Failures in the etcd cluster at the heart of Kubernetes will cause a failure of Kubernetes. To mitigate this risk, the system is deployed with etcd on dedicated disks and with a specific configuration to optimize Kubernetes workloads. The system also provides additional etcd cluster(s) as necessary to help maintain an operational state of services. These additional clusters are managed by a Kubernetes operator and do not interact with the core Kubernetes etcd service.
To learn more about etcd, refer to the following links:
Communication between etcd machines is handled via the Raft consensus algorithm. Latency from the etcd leader is the most important metric to track because severe latency will introduce instability within the cluster. Raft is only as fast as the slowest machine in the majority. This problem can be mitigated by properly tuning the cluster.
etcd is a highly available key value store that runs on the three non-compute nodes (NCNs) that act as Kubernetes worker nodes. The three node cluster size deployment is used to meet the minimum requirements for resiliency. Scaling to more nodes will provide more resiliency, but it will not provide more speed. For example, one write to the cluster is actually three writes, so one to each instance. Scaling to five or more instances in a cluster would mean that one write will actually equal five writes to the cluster.
The system utilizes etcd in two major ways: