![]()
Choosing accurate CPU and memory request values for Kubernetes workloads is a difficult endeavor. This difficulty results in application developers overprovisioning their workloads to ensure that application performance will not be affected. This can lead to increasing cloud costs and inefficient resource usage. In addition, it is also possible that workloads can be underprovisioned inadvertently. This can negatively affect application performance and potentially even lead to service disruptions.
In this blog, we describe how Kubernetes Vertical Pod Autoscaler (VPA) can be leveraged in conjunction with Luna, a powerful cluster autoscaler - to ensure that Kubernetes workloads are right-sized by VPA and the Kubernetes cluster as well as nodes are right-sized by Luna - resulting in cost-effective and performant operations. Overview of VPA
The Vertical Pod Autoscaler in Kubernetes leverages CPU and memory usage history of managed workloads to make recommendations of resource request values for containers and optionally update a container’s resource requests in an automated fashion. Workloads that can be vertically scaled using VPA include Deployments, Statefulsets, Daemonsets as well as Custom Resources (that have the scale subresource defined). VPA uses the Kubernetes metrics server to monitor and track CPU and memory resource usage.
VPA is implemented as a Custom Resource in Kubernetes. An instance of the custom resource will need to be created for each workload that the user would like to manage or vertically autoscale. VPA can be used in 3 different modes. These are described below:
VPA: Under the hood
The Vertical Pod Autoscaler consists of 3 components in the kube-system namespace:
The VPA recommender calculates resource requests using a decaying histogram of monitored
CPU and memory usage metrics. In a decaying histogram, the weight of each metric value decreases over time. By default, a historical CPU usage sample loses half of its weight in 24 hours. This default value can be changed using the flag, --cpu-histogram-decay-half-life. The frequency at which CPU and memory metrics are fetched defaults to 1 minute and can be changed using the flag, --recommender-interval. An extensive list of other flags to customize the vpa-recommender are documented here: VPA-recommender flags. A detailed description of margins and confidence intervals that are applied over the decaying histogram technique can be found in this CNCF blog post: Optimizing VPA responsiveness and here. The VPA object for each Kubernetes resource can also be configured to provide recommendations for both CPU and memory or just one of these resources using the controlledResources parameter in the VPA object (shown in the example VPA object below). It is important to note that it is not recommended to use VPA along with the Horizontal Pod Autoscaler for the same resource. More details about this limitation can be found in these references: VPA design docs, Known Limitations of VPA and VPA on GKE Limitations. Let’s look at an example of a VPA object:
The targetRef field refers to the Kubernetes resource that this VPA object manages, which in this case is a deployment named “workload-a”. The updatePolicy field can be one of the four modes listed in the Overview section: Off, Initial, Auto or Recreate. The minAllowed and maxAllowed fields are used to set the absolute minimum and maximum values that the VPA can recommend. This prevents excessive resource usage as well as resource starvation for pods and can help to keep performance and cost within acceptable bounds.
Let’s now look at an example of a recommendation within the VPA object after it begins operation:
In the above snippet, Target, refers to the recommended values of CPU and memory requests for the container named “workload-a”. It corresponds to the 90 percentile (by default) of the decaying histogram of observed peak usage values. This percentile value can be configured using the flags --target-cpu-percentile and --target-memory-percentile when starting up the vpa-recommender.
Uncapped Target refers to the recommended values of CPU and memory requests for the same container without taking into consideration the maxAllowed value in the Spec section of the VPA custom resource object. The lower bound and upper bound values correspond to the 50th percentile and 95th percentile of the decaying histogram and these can be configured with the flags: --recommendation-lower-bound-cpu-percentile and --recommendation-upper-bound-cpu-percentile. VPA: Better Together with Cluster Autoscaling
In this section, let’s look at how Vertical Pod Autoscaling and Cluster autoscaling compliment each other. VPA can be utilized to right-size pods that are initially either overprovisioned or underprovisioned. We delve into each of these cases and find out how a cluster autoscaler can help with both.
Application Under-Provisioning
When a pod is underprovisioned, VPA recommends larger resource values than its current allocation. In this case, the current cluster nodes may not be able to accommodate the updated pod. This can result in pods remaining in pending state. In such a case, having an Intelligent Kubernetes Cluster Autoscaler, like Luna, becomes critical to keep the application or service running without interruptions. Luna automates the addition of a right-sized cluster node to accommodate these pending pods (that were recreated because of the actions of the VPA-updater).
Additionally, Luna places pods on nodes via two techniques:
When an underprovisioned pod’s resources are increased by VPA, a bin-pack designated pod may become a bin-select designated pod. In this case, Luna automatically detects this change and places the pod appropriately on a bin-select node. We illustrate this via an experiment in the section: “Experiment 4: VPA and Luna Interoperation to Handle Pod Under-provisioning”. Application Over-Provisioning
When a pod is overprovisioned, VPA recommends smaller resource values than its current allocation. In this case, since the pod’s resource request is smaller, total cluster capacity will not need to change - i.e. the cluster will continue to be able to accommodate the updated pod.
However, the decrease in resource requests could result in a change in the designation of a pod from bin-select to bin-pack. In this case, the pod, after restart, will be placed on a bin-pack node by Luna. The bin-select node will automatically get scaled-in (or deleted) if no other pods were also running on that node. A detailed experiment of this scenario is described in the section: “Experiment 3: VPA and Luna Interoperation to Handle Pod Over-provisioning”. VPA & Luna Interoperability Experiments
In this section, we detail a number of experiments to showcase how VPA and Luna interoperate under different operational conditions and modes.
Experiment 1: Interoperation of VPA in “Auto mode” and Luna
In this experiment, we illustrate an example where VPA recommends increased resources to a managed deployment. Luna promptly detects that the pod recreated by the vpa-updater cannot be accommodated as-is in the current cluster and hence adds a new node to the cluster and places the restarted pod on this new node.
When Luna and VPA (in auto mode) are used together, their admission webhooks need to be executed in the correct order. At first, the VPA admission controller adjusts pods’ resource values and then Luna’s admission webhook needs to come into effect. Luna uses the updated resource values in a pod to then choose an appropriate node. Luna provides a configuration parameter, called webhookConfigPrefix, to enable this ordering. 1. Initial Setup
Two deployments, “workload-A” and “workload-B” are running on 2 nodes in a Luna-enabled EKS cluster.
2. Starting a VPA managed workload
A third deployment, workload-C, managed by VPA is created on this cluster.
The VPA custom-resource is seen below.
We see that the CPU and memory request values are not immediately available.
Initially, workload-C is placed by Luna on an existing Luna-managed node, ip-192-168-20-122 because there is sufficient capacity on that node.
Workload-C was chosen such that its CPU usage can be configured to spike up or down as needed: cpu-stressor-pod.
We then see that the original pods’ usage begins to spike up, as captured below:
We see that the VPA updater evicts one of the pods, and a newly created replacement pod enters into Pending state:
At the same time, we see that the CPU and memory recommendations are updated within the VPA custom resource object.
Within a minute, we see that the pending pod successfully starts running on a newly created node (ip-192-168-30-113), that was triggered by Luna. We verify that the node creation was in fact initiated by Luna by checking that node’s labels include: node.elotl.co/created-by=luna.
This experiment showcases that for Vertical pod autoscaling to be used in an automated fashion, an intelligent autoscaler like Luna is critical to scale-out nodes when necessary.
Experiment 2: Interoperation of VPA in “Initial mode” and Luna
In this experiment, we illustrate an example where VPA recommends increased resources to a managed deployment. However, since VPA is configured in “Initial” mode, resource requests are not automatically applied to containers. In this mode, requests are applied only during pod creation. So application administrators can restart a pod manually to update requests.
Pods initially run on the existing Luna-managed node, ip-192-168-20-122.
After new VPA recommendations have been calculated in the VPA object, pods are deleted.
We see that 1 replica of the workload gets started on a new Luna-triggered node (ip-192-168-3-62) taking into account the pod’s newly assigned resource request.
With the updated assignment, both the existing node (ip-192-168-20-122) and the new Luna provisioned node (ip-192-168-3-62) are operating at full capacity.
Experiment 3: VPA and Luna interoperation to handle Pod over-provisioning
When a pod is initially over-provisioned, VPA can recommend lower resource request values by observing resource usage over a period of time. These lower resource values, recommended by VPA, can result in a pod, initially categorized as a bin-select pod by Luna, to later be categorized as a bin-pack pod. In the experiment described below, we showcase how VPA and Luna work well together to handle this appropriately.
1. Creation of an over-provisioned workload
We create a workload, workload-g, that is overprovisioned. A VPA object is created for this deployment. Initially, VPA does not have a resource recommendation since there is insufficient historical data.
2. Pod placement as bin-select on separate nodes
Initially, the pods of this deployment request 2 CPUs each, as specified in the deployment manifest. Luna marks these pods as bin-select pods since the CPU request value falls below the default threshold for bin-packing in Luna.
As can be seen below, Luna places the pods on two separate nodes, ip-192-168-11-117 and ip-192-168-31-139.
3. Pod right-sizing by VPA
After a few minutes of operation, VPA utilizes the usage metrics and recommends the following resource requests. We see that the recommended CPU request for the pod is only 163m of CPU while the original CPU request in the pod’s manifest was for 2 CPUs.
4. Right-sized Pod placement via bin-packing by Luna
We use VPA in auto update mode in this experiment. So the pods get restarted and get updated with the recommended lower resource values automatically. Luna detects the new resource values on the restarted pods and places them as bin-pack pods on an existing bin-pack node, ip-192-168-20-122, as seen below.
From this experiment, we see that using Luna with VPA can help handle overprovisioned pods by right-sizing them and placing them on appropriate nodes automatically.
Experiment 4: VPA and Luna Interoperation to Handle Pod Underprovisioning
Just as applications can be overprovisioned, as we saw in Experiment 3, applications can also be under-provisioned. This can result in a degradation of application performance and necessitates prompt remediation. In the following example, we show how VPA and Luna operate together to handle this situation without any manual intervention.
1. Creation of an under-provisioned workload
An under-provisioned Kubernetes deployment, workload-f is created. The workload’s CPU request is set to 100m in its manifest. We use the cpu-stressor-pod to configure its actual CPU usage to be much larger that this 100m request. A VPA object is also created for this deployment. Initially, the VPA object managing this deployment does not have any resource recommendations due to insufficient historical data:
2. Pod placement as bin-pack by Luna
Since the pod’s CPU request of 100m falls below the default bin-pack threshold of 2 CPUs, Luna places both replicas of workload-f on a bin-pack node, ip-192-168-20-122.
3. Pod right-sizing by VPA
Using the kubectl top command, we see that workload-f’s CPU usage is much higher than its original request value of 100m for CPU.
Soon, VPA utilizes observed CPU usage values and recommends a higher value of CPU - 2406m, as seen below:
4. Right-sized Pod placement via bin-select by Luna
Since VPA is in auto mode for this experiment, workload-f’s pods are recreated with the recommended higher CPU request values. The new CPU values now exceed Luna’s bin-packing threshold value of 2 CPUs. Luna, in turn, responds by placing these pods on newly created bin-select nodes ip-192-168-28-198 and ip-192-168-11-176.
From this experiment, we see that Luna and VPA work well together to manage under-provisioned resource requests of pods without any manual intervention.
VPA and In-place Pod Resizing
In-place pod update is a feature in Kubernetes that allows pods’ resource requests to be updated without having to evict and restart the pod. It has been available as an alpha feature from Kubernetes 1.27 (behind a feature gate). It is available as beta from Kubernetes 1.33.
Currently, in released versions of VPA (as of April 2025), the vpa-updater component does not utilize in-place pod resizing. However, VPA is being extended to leverage this feature; details of this development are tracked here: AEP-4016. It is important to note that VPA with in-place updates is not guaranteed to prevent pod disruptions since the actuating resize operation depends on the underlying container runtime. The end-user expectation is for pod disruptions to be minimal. When VPA is able to utilize the in-place pod resizing feature, Luna’s hot node mitigation feature may be able to help handle those cases where pods with increased resource requests cause excessive node utilization. Hot node mitigation is described in detail in this blog post: Luna Hot Node Mitigation: A chill pill to cure pod performance problems. Conclusion
In summary, when considering to use Vertical Pod Autoscaling for your Kubernetes workloads, leveraging an Intelligent Kubernetes Cluster Autoscaler, like Luna can ensure that restarted, scaled-up or scaled-down pods in your cluster can be placed on just-in-time, right-sized nodes in a fully automated fashion. If you would like to try VPA with an intelligent cluster autoscaler, please download Luna and reach out to us with questions or comments at [email protected] .
Author: Selvi Kadirvel (VP Engineering, Elotl) Comments are closed.
|