Wednesday, May 27, 2026

k8s PODs rescheduling in case of failure of nodes

Question

Consider a deployment where I have deployed say 10 pods across 5 nodes. Consider the deployment is even so each node gets two pods. Now suppose two nodes suddenly go down ( may be their network switch has stopped suddenly). So how will k8s create those 4 pods on the remaining 3 nodes ? What is the logic it will use to distribute 4 pods across 3 nodes ?

Scenario Summary

  • Desired replicas: 10 pods
  • Nodes: 5 nodes (2 pods per node)
  • 2 nodes suddenly go down → 4 pods are lost
  • Remaining: 3 nodes
  • Kubernetes needs to create 4 new pods on the remaining 3 nodes

How Kubernetes Handles This

Kubernetes does not try to keep the exact previous distribution. Instead, it follows this process:

1. Detection Phase

  • The Node Controller detects that 2 nodes are NotReady (after node-monitor-grace-period, default 40s).
  • Pods on those failed nodes are marked as Unknown → then Terminating.
  • The ReplicaSet controller notices that current ready pods < desired replicas (10), so it creates 4 new Pods.

2. Scheduling Logic for the 4 New Pods

The kube-scheduler decides where to place these new pods using this priority order:

1. Feasibility (Can the pod be scheduled here?)

  • Enough CPU/Memory requests available?
  • Node is Ready?
  • No conflicting taints without tolerations?

2. Scoring (Which node is best?)

  • Least Requested (prefers nodes with more free resources)
  • Balanced Resource Allocation
  • Node Affinity / Anti-Affinity (if defined)
  • Topology Spread Constraints (very important here)
  • Inter-Pod Affinity

Default behavior (without Topology Spread):

  • Kubernetes will try to spread pods, but it’s not strictly even.
  • Likely distribution of the 4 new pods on 3 nodes: 2 + 1 + 1

Final state example:

Original remaining pods: 6 (2 per remaining node)
New pods: 4

Final distribution: Probably 4, 3, 3 or 4, 4, 2


Best Way to Control Distribution: Use Topology Spread Constraints

Here’s the recommended way to make Kubernetes behave more predictably:

apiVersion: apps/v1 kind: Deployment metadata: name: my-app spec: replicas: 10 template: spec: topologySpreadConstraints: - maxSkew: 1 # Very important topologyKey: kubernetes.io/hostname whenUnsatisfiable: ScheduleAnyway labelSelector: matchLabels: app: my-app # Optional: Also spread across zones if you have them - maxSkew: 2 topologyKey: topology.kubernetes.io/zone whenUnsatisfiable: ScheduleAnyway

With maxSkew: 1 in the above case:

  • After rescheduling 4 pods, Kubernetes will try its best to achieve:
  • Most even possible distribution → 4, 3, 3

Summary of Logic Kubernetes Uses

Factor Influence on Distribution
Resource Requests Strong (avoids overloading nodes)
Topology Spread Constraints Strongest control for even spread
Least Requested Priority Prefers less loaded nodes
Node Affinity Can bias toward specific nodes
Default Scheduler Good spread, but not perfect

Practical Tips

  1. Always define topologySpreadConstraints in production — especially maxSkew: 1.
  2. Set reasonable resources.requests so nodes don’t get overloaded.
  3. You can monitor rescheduling with:
kubectl get pods -o wide -w kubectl get events --sort-by=.lastTimestamp

Next Question

What happens if due to certain rules, it is not possible for k8s to create those 4 pods at all ?

This is a very important scenario. Here's exactly what happens when Kubernetes cannot schedule those 4 pods due to constraints.


What Happens When Scheduling Fails?

When Kubernetes cannot find suitable nodes for the pending pods, the following occurs:

  1. Pods enter Pending state
    • The ReplicaSet still wants 10 pods, but only 6 are running.
    • The 4 new pods stay in Pending status indefinitely (until the situation changes).
  2. No automatic "force scheduling"
    • Kubernetes will not kill existing pods to make space.
    • It will not violate hard constraints (like required node affinity, taints, or very strict Topology Spread).
  3. The cluster remains in an under-capacity state
    • You will have only 6 running pods instead of 10.
    • Your application will run with reduced capacity.

No comments:

Post a Comment

Getting environment variables in React app

To get environment variables in React, the method depends on the tool you used to build your project. React environment vari...