Cordon and drain are administrative operations in Kubernetes used to safely take a worker node offline for maintenance, upgrades, or decommissioning. "Cordoning" marks the node as unschedulable for new pods, while "draining" safely evicts all existing pods to other healthy nodes.
These commands prevent workload disruption and protect data during cluster administration. Here is a breakdown of how both processes work:
1. Cordon: Closing the Door to New Work
When you cordon a node, Kubernetes stops scheduling new pods on that machine.
-
What happens: The node's status changes to
SchedulingDisabled. - Effect on existing workloads: None. Any pods currently running on the node continue to operate normally until they finish or are deleted.
- When to use it: When you want to gracefully phase out a node or prepare it for a drain, but you want existing pods to finish their current tasks first.
2. Drain: Emptying the Node
When you drain a node, you perform a full evacuation of the machine.
- What happens: First, Kubernetes automatically cordons the node to ensure no new pods land on it. Then, it gently evicts all running pods.
- Effect on existing workloads: The pods are gracefully terminated on the current node and recreated on other available nodes in your cluster.
- When to use it: When you need to safely shut down the server, reboot it, or perform hardware/OS maintenance without causing downtime for your applications.
How to Execute (Commands)
These operations are performed using the kubectl command-line tool.
- To cordon a node: kubectl cordon <node-name>
- To drain a node: kubectl drain <node-name>
- To reverse the process (make it available again): kubectl uncordon <node-name>
Best Practices & Things to Know
-
DaemonSets: By default, the drain command will not evict pods that are part of a DaemonSet (as these typically run on every node to provide cluster-wide services). You have to pass the
--ignore-daemonsetsflag to force the drain. - PodDisruptionBudgets (PDB): When you drain a node, Kubernetes respects PDBs to ensure a minimum number of replicas remain running to prevent service outages.
No comments:
Post a Comment