4 minute read
The Vertical Pod Autoscaler (VPA) is an extension of the Kubernetes API that provides vertical scaling for Kubernetes controllers such as Deployments and their Pods. Its operation is more complex than that of the Horizontal Pod Autoscaler (HPA) as it optimizes the resource request parameters of Pods based on metrics collected from workloads. “Requests” are declarative statements of the minimum required resources for one or more containers within a Pod. Higher values grant the planned Pod more access to CPU or RAM. If a workload is found to be consuming more resources than its specifications indicate, the VPA calculates a new set of appropriate values within its given limits.
The VPA allows two types of resources to be specified for each container within a Pod: Requests and Limits.
Requests / Limits:
The VPA periodically queries the Kubernetes API for a Pod’s resource utilization, and adjusts the number of replicas as needed to achieve a target resource utilization. In detail:
UpdateMode: Depending on how the VPA is configured, it can have the following modes:
Here is an example of how to use VPA in a YAML configuration:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: my-deployment-vpa
spec:
targetRef:
apiVersion: "apps/v1"
kind: Deployment
name: my-deployment
resourcePolicy:
containerPolicies:
- containerName: '*'
controlledResources:
- cpu
- memory
maxAllowed:
cpu: 1
memory: 500Mi
minAllowed:
cpu: 100m
memory: 50Mi
updatePolicy:
updateMode: "Auto"
Like HPA, VPA is a powerful tool but has some limitations and may not be ideal for every use case. It cannot solve every problem with cluster resources, and some considerations include:
There are several ways to test the Kubernetes Vertical Pod Autoscaler (VPA):
Using a Load-Testing Tool: One way to test VPA is to use a load-testing tool like Apache JMeter or Gatling to generate load on an application. You can observe how VPA responds by increasing or decreasing the number of replicas based on pod resource consumption.
Using the “kubectl” Command: You can manually increase or decrease the number of Pod replicas using “kubectl” and observe how VPA responds. For example, “kubectl scale” can be used to adjust the number of replicas of a Deployment or RC.
VPA and HPA can conflict, especially when both are configured to scale based on the same resource, such as RAM. This can lead to both trying to scale workloads vertically and horizontally simultaneously, resulting in unpredictable outcomes. To avoid such conflicts, it is best practice to have HPA focus on different metrics than VPA, such as using custom metrics for HPA while VPA scales based on CPU or RAM.
Once VPA is set up, you can monitor it through the Kubernetes API or using monitoring tools like Prometheus, Grafana, or Kubernetes Dashboard.