PSKE - Cluster Autoscaling

3 minute read

The Cluster Autoscaler, also known as the Horizontal Node Autoscaler (HNA), is a tool that automatically adjusts the number of worker nodes in a Kubernetes cluster under the following conditions:

Pods cannot start because of insufficient resources.
Worker nodes are underutilized for a period of time (30 minutes by default), and pods can be distributed to other worker nodes.

Prerequisites

To install the Horizontal Node Autoscaler in a Shoot Cluster, the “Autoscaler Min.” and “Autoscaler Max.” values must be defined in at least one worker group.

“Autoscaler Min.” defines the minimum number of worker nodes within the worker group.
“Autoscaler Max.” defines the maximum number of worker nodes that the Horizontal Node Autoscaler will provide in the event of resource shortages within a worker group.

HNA

Simulation

Currently, the Shoot Cluster has one worker node:

kubectl describe node shoot--ldtivqit95-worker-jh07p-z1-7d897-cgrw6
Capacity:
  cpu:                2
  ephemeral-storage:  50633164Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             4030532Ki
  pods:               110
Allocatable:
  cpu:                1920m
  ephemeral-storage:  49255941901
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             2879556Ki
  pods:               110
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests      Limits
  --------           --------      ------
  cpu                1047m (54%)   0 (0%)
  memory             1120Mi (39%)  18788Mi (668%)
  ephemeral-storage  0 (0%)        0 (0%)
  hugepages-1Gi      0 (0%)        0 (0%)
  hugepages-2Mi      0 (0%)        0 (0%)

An NGINX Deployment is created:

kubectl apply -f deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx
        ports:
        - containerPort: 80
        resources:
          requests:
            memory: "2048Mi"
          #   cpu: "500m"
          limits:
            memory: "2048Mi"
            # cpu: "500m"

The existing worker node’s resources are insufficient, triggering the cluster-autoscaler (Horizontal Node Autoscaler):

k describe pod nginx-54c7fd947f-b2k67
Events:
  Type     Reason            Age   From                Message
  ----     ------            ----  ----                -------
  Warning  FailedScheduling  37s   default-scheduler   0/1 nodes are available: 1 Insufficient memory. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.
  Normal   TriggeredScaleUp  26s   cluster-autoscaler  pod triggered scale-up: [{shoot--ldtivqit95-worker-jh07p-z1 1->2 (max: 3)}]

After the deletion of the deployment, the additional worker node is deprovisioned by the Cluster Autoscaler (Horizontal Node Autoscaler) after 30 minutes.

Best Practices

Do not manually modify nodes that are part of an autoscaling group. All nodes in the same node group should have the same capacity and labels.
Use requests for containers/pods.
Use PodDisruptionBudgets to prevent pods from being deleted too abruptly (if needed).
Ensure that your cloud provider’s quota is sufficient before setting the min/max values for the Horizontal Node Autoscaler.
Avoid using additional Node Group Autoscalers (even from your cloud provider).

Conclusion

The Horizontal Node Autoscaler functions as described above. It’s essential to adhere to best practices for the Cluster Autoscaler to work as intended. The HNA is enabled by default in PSKE and is available for your use.

Last modified 12.11.2024: Changing to new picture schema, removing aws docs, adding new screenshots (8887530)