Wednesday, May 27, 2026

Distributing PODs across nodes, as per the capabilities of nodes

Question

I have a frontend Service. I have a good server machine with 64 GB RAM and two desktops with 16 GB RAM each. I want to deploy 15 pods of my frontend service in such a way that my server box will run 9 pods, while the desktops will run 3 pods each. How do i instruct k8s to do this ?

This is a classic pod scheduling requirement where you want uneven distribution across nodes.

Recommended Solution

The cleanest and most maintainable way is to:

  1. Label your nodes to identify them.
  2. Use two separate Deployments (or one Deployment + one with different replicas) with nodeSelector.

Step-by-Step Implementation

Step 1: Label your nodes

Run these commands:

# Label the powerful server
kubectl label node <server-node-name> node-type=powerful --overwrite

# Label the two desktops
kubectl label node <desktop1-node-name> node-type=normal --overwrite
kubectl label node <desktop2-node-name> node-type=normal --overwrite

You can check node names with:

kubectl get nodes -o wide

Step 2: Create Two Deployments

Deployment 1 – For the powerful server (9 pods):

apiVersion: apps/v1
kind: Deployment
metadata:
  name: frontend-powerful
spec:
  replicas: 9
  selector:
    matchLabels:
      app: frontend
  template:
    metadata:
      labels:
        app: frontend
    spec:
      nodeSelector:
        node-type: powerful
      containers:
      - name: frontend
        image: your-frontend-image:latest
        resources:
          requests:
            cpu: "500m"
            memory: "1Gi"
          limits:
            cpu: "1"
            memory: "2Gi"

Deployment 2 – For the two desktops (3 pods each = total 6):

apiVersion: apps/v1
kind: Deployment
metadata:
  name: frontend-normal
spec:
  replicas: 6
  selector:
    matchLabels:
      app: frontend
  template:
    metadata:
      labels:
        app: frontend
    spec:
      nodeSelector:
        node-type: normal
      containers:
      - name: frontend
        image: your-frontend-image:latest
        resources:
          requests:
            cpu: "500m"
            memory: "1Gi"
          limits:
            cpu: "1"
            memory: "2Gi"

Both Deployments use the same label app: frontend, so they will be part of the same logical application (you can use one Service to expose all 15 pods).


Alternative Approaches

Method Pros Cons Recommendation
Multiple Deployments (above) Simple, exact control (9 & 6) Two Deployments to manage Best for your case
Node Affinity (Preferred) Single Deployment Harder to guarantee exact 9/3/3 Good but less precise
Topology Spread Constraints Built-in spreading Designed for even spread, not uneven Not ideal here
nodeName (hardcoding) Very direct Brittle, not scalable Avoid

Important Tips

  1. Resource Requests are very important here. Because your desktops have only 16GB RAM, set realistic requests so the scheduler doesn't overload them.
  2. You can combine nodeSelector with Node Affinity for more flexibility (soft preference).
  3. Monitor with:
kubectl get pods -o wide
kubectl describe node <node-name>

Second Approach : Node Affinity

Here's the Node Affinity version (single Deployment approach):

Step 1: Label your nodes (same as before)

kubectl label node <server-node-name>   node-type=powerful --overwrite
kubectl label node <desktop1-node-name> node-type=normal --overwrite
kubectl label node <desktop2-node-name> node-type=normal --overwrite

Step 2: Single Deployment with Node Affinity

apiVersion: apps/v1
kind: Deployment
metadata:
  name: frontend
spec:
  replicas: 15
  selector:
    matchLabels:
      app: frontend
  template:
    metadata:
      labels:
        app: frontend
    spec:
      affinity:
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            preference:
              matchExpressions:
              - key: node-type
                operator: In
                values:
                - powerful

          - weight: 30
            preference:
              matchExpressions:
              - key: node-type
                operator: In
                values:
                - normal

      containers:
      - name: frontend
        image: your-frontend-image:latest
        resources:
          requests:
            cpu: "500m"
            memory: "1Gi"
          limits:
            cpu: "1"
            memory: "2Gi"

How this works:

  • The powerful server has a much higher weight (100) → Kubernetes scheduler will strongly prefer scheduling pods on it.
  • The two desktops have lower weight (30) → They will still get pods, but fewer.
  • In practice, you will often see something close to 9~11 pods on the server and 2~3 pods on each desktop, depending on current cluster state.

Third Approach : Node Affinity + Topology Spread Constraints

Here's the advanced version combining Node Affinity + Topology Spread Constraints in a single Deployment.

Advanced YAML (Node Affinity + Topology Spread)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: frontend
spec:
  replicas: 15
  selector:
    matchLabels:
      app: frontend
  template:
    metadata:
      labels:
        app: frontend
    spec:
      affinity:
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 90
            preference:
              matchExpressions:
              - key: node-type
                operator: In
                values:
                - powerful

          - weight: 20
            preference:
              matchExpressions:
              - key: node-type
                operator: In
                values:
                - normal

      topologySpreadConstraints:
      - maxSkew: 4
        topologyKey: kubernetes.io/hostname
        whenUnsatisfiable: ScheduleAnyway
        labelSelector:
          matchLabels:
            app: frontend

      containers:
      - name: frontend
        image: your-frontend-image:latest
        resources:
          requests:
            cpu: "500m"
            memory: "1Gi"
          limits:
            cpu: "1"
            memory: "2Gi"

Final Recommendation

Even though the advanced version is powerful, for your exact requirement (9 on server, 3 on each desktop), the two separate Deployments approach remains the most reliable and production-friendly solution.

It gives:

  • Exact control over pod count
  • Predictable scheduling behavior
  • Cleaner troubleshooting
  • Independent scaling if needed later
  • Safer resource management for weaker nodes

1. How to monitor the actual distribution after applying this?

Use these commands to see how pods are distributed across your nodes:

Best Command (Recommended):

kubectl get pods -o wide -l app=frontend

This shows which node each pod is running on.

More Detailed Distribution Summary:

kubectl get pods -l app=frontend -o wide | awk '{print $7}' | sort | uniq -c

Even better one-liner for nice summary:

echo "=== Pod Distribution ===" && \ kubectl get pods -l app=frontend --no-headers -o custom-columns="NODE:.spec.nodeName" | \ sort | uniq -c | sort -nr

Watch in real-time:

watch -n 3 "kubectl get pods -l app=frontend --no-headers -o custom-columns='POD:.metadata.name,NODE:.spec.nodeName,STATUS:.status.phase' | sort -k2"

2. Commands to test and fine-tune weights / skew

Here are useful commands to experiment:

Check current scheduling decisions:

kubectl describe pod <pod-name> | grep -A 10 "Node:"

Fine-tuning commands:

A. Increase preference for powerful node (increase weight):

  • Change weight from 90 to 150 or 200 in Node Affinity.

B. Make distribution stricter (reduce maxSkew):

  • Change maxSkew: 4 → maxSkew: 2 (more even distribution)
  • Change maxSkew: 4 → maxSkew: 6 (allows more uneven)

C. Test changes quickly:

# Scale down and up to force rescheduling kubectl scale deployment frontend --replicas=0 kubectl scale deployment frontend --replicas=15

D. Check scheduler events:

kubectl get events --sort-by=.metadata.creationTimestamp | tail -20

Pro Tip: Start with:

  • weight: 100 for powerful node
  • weight: 20 for normal nodes
  • maxSkew: 3

Then adjust based on the monitoring output.


3. Best of Both Worlds: Two Deployments + Topology Spread

This is my strong recommendation for your use case (9 on server, 3 on each desktop).

Deployment 1: Powerful Server (9 pods)

apiVersion: apps/v1 kind: Deployment metadata: name: frontend-powerful spec: replicas: 9 selector: matchLabels: app: frontend tier: powerful template: metadata: labels: app: frontend tier: powerful spec: nodeSelector: node-type: powerful topologySpreadConstraints: - maxSkew: 2 topologyKey: kubernetes.io/hostname whenUnsatisfiable: ScheduleAnyway labelSelector: matchLabels: app: frontend containers: - name: frontend image: your-frontend-image:latest resources: requests: cpu: "600m" memory: "1.2Gi" limits: cpu: "1200m" memory: "2.5Gi"

Deployment 2: Normal Desktops (Total 6 pods)

apiVersion: apps/v1 kind: Deployment metadata: name: frontend-normal spec: replicas: 6 selector: matchLabels: app: frontend tier: normal template: metadata: labels: app: frontend tier: normal spec: nodeSelector: node-type: normal topologySpreadConstraints: - maxSkew: 1 topologyKey: kubernetes.io/hostname whenUnsatisfiable: ScheduleAnyway labelSelector: matchLabels: app: frontend containers: - name: frontend image: your-frontend-image:latest resources: requests: cpu: "400m" memory: "800Mi" limits: cpu: "800m" memory: "1.5Gi"

Why This is the Best Approach:

  • Exact control: You get exactly 9 on server, 6 across desktops.
  • Topology Spread prevents both desktops from getting unbalanced (e.g., 5 + 1).
  • Different resource requests (higher on powerful node).
  • One Service can still target all pods using app: frontend.

No comments:

Post a Comment

Kubernetes Topology Spread Constraints

Topology spread constraints are rules in Kubernetes used to control how Pods are distributed across your cluster's failure doma...