Elotl
  • Home
  • Products
    • Luna
    • Nova
  • Resources
  • Podcast
  • Company
    • Team
    • Careers
    • Contact
  • Free Trial
    • Luna Free Trial
    • Nova Free Trial
  • Home
  • Products
    • Luna
    • Nova
  • Resources
  • Podcast
  • Company
    • Team
    • Careers
    • Contact
  • Free Trial
    • Luna Free Trial
    • Nova Free Trial
Search

Right-Sizing Your Kubernetes Pods with a Custom VPA Tracker

7/31/2025

 
Picture
The Kubernetes Vertical Pod Autoscaler (vpa) provides near-instantaneous recommendations for CPU and memory requests for a pod. It can be used either as a read-only or as a fully automated recommender, where pods are mutated with the recommended requests. 

When a cluster operator is considering whether or not to use VPA for a specific workload, it is helpful to simply monitor and visualize both VPA recommendations along with actual resource usage over a test period, before using it in an automated fashion. In this blog, we illustrate how we can track VPA operation over such a test period using a popular open-source monitoring and visualization stack for Kubernetes (which includes Prometheus and Grafana).

Motivation for VPA tracking

Kubernetes VPA can be used in two primary update modes: Off (read-only mode) and Auto (aka Recreate). In the Off mode, the VPA custom resource provides near-instantaneous recommendations for suitable values of CPU and memory requests for pods in various types of Kubernetes resources - such as deployments, jobs, daemonsets, etc. Workload administrators can use these recommendations to manually update pod requests. Given below is an example of CPU and memory recommendations within a VPA custom resource object.

Recommendation:
Container Recommendations:
   Container Name:  workload-c
...
   Target:
     Cpu: 587m
     Memory:  262144k
... 
    
As a pod’s resource usage changes, VPA recommendations also get updated based on resource utilization data. So, if a cluster administrator wanted to observe these recommendations over time and then use them to manually choose the right value for their pods, it is not possible to do so out-of-the-box with VPA. We would need a way to run a specific workload, managed by VPA over a sufficient period of time, then use a monitoring tool, like Prometheus to collect both a) resource usage from the pod b) Target recommendations from the VPA object. A visualization tool like Grafana can then be used to visually inspect these values over the test period. At periodic intervals the maximum recommendation from VPA can be then used to manually update a pod’s manifest - which can then be redeployed via appropriate rolling update techniques on the cluster. 

Let’s look into each of the components needed for this VPA tracker and the steps involved in setting up the monitoring and visualization stack for this example workload.

Workload and VPA object

A VPA custom resource object is needed for every Kubernetes resource that is to be managed by VPA. We create a sample workload and a VPA custom resource for this workload. The workload used in this blog post is available in this Github repo: elotl/vpa-tracker.

kubectl apply -f “workload-c.yaml”
    
The workload uses the CPU stressor pod from this Github repo: narmidm/k8s-pod-cpu-stressor. It allows us to control the CPU usage of a deployment’s pods via an input parameter in the deployment manifest.

VPA metrics exporter

The VPA object makes available its resource recommendations in the object’s Status field. We created a simple Python script to export metrics from all VPA custom resources in our cluster to a /metrics endpoint. This exporter is in this Elotl public repo.  The VPA exporter consists of a Kubernetes deployment and service and can be deployed as follows. A specific label is added to the default namespace to indicate to Prometheus that all ServiceMonitors in this namespace are to be included by Prometheus for scraping.

kubectl apply -f vpa-tracker/vpa-metrics-exporter/vpa_exporter.yaml
kubectl port-forward svc/vpa-exporter  8080:8080
kubectl label servicemonitor vpa-exporter  release=kube-prometheus-stack --overwrite
    

Monitoring of VPA metrics 

Any Kubernetes monitoring tool can be used to monitor workload resource usage and the VPA metrics. As an example, in this blog, we use these open-source tools: 
  • kube-state-metrics for exporting all Kubernetes resource metrics, such as CPU and memory usage
  • Prometheus for scraping both usage and VPA metrics from their respective endpoints
  • Grafana for visualizing metrics via Dashboards
The kube-prometheus-stack project is an easy way to install these three components. 
Prometheus, when installed via the kube-prometheus-stack, by default scrapes all metrics collected by the kube-state-metrics tool. However, an additional configuration step is needed to scrape the new VPA metrics that are being exported by the VPA exporter described in the prior section. This is done by creating a ServiceMonitor Custom resource object and exposing the needed Service.

kubectl apply -f vpa-recommender-servicemonitor.yaml
kubectl apply -f vpa-metrics-expose-svc.yaml
    

Visualization of VPA metrics

Target, refers to the recommended values of CPU and memory requests for the workload. It corresponds to the 90 percentile (by default) of the decaying histogram of observed peak usage values.  This percentile value can be configured using the flags --target-cpu-percentile and --target-memory-percentile when starting up the vpa-recommender.

Uncapped Target refers to the recommended values of CPU and memory requests for a pod without taking into consideration the max allowed value in the Spec section of the VPA custom resource object. 
Picture
Let’s look in detail at an example of the custom panel. In the graph above, at around 12pm, we increase the CPU usage of the CPU stressor pod from 120 millicores to 230 millicores. We do this by editing the deployment’s cpu flag from a value of 0.1 to 0.2. We see that, at ~2:45pm, the VPA target recommendations (shown in yellow & green and overlapping this case) increases to an appropriate value of ~260 millicores.

Scale-up and Scale-down Response Times

By scale-up response time, we refer to the time taken for the VPA CPU target to envelop a step increase in CPU usage. In many practical use-cases, increase in CPU usage can also be gradual. For the sample workload above and default VPA configuration parameters, we see that the scale-up response time is approximately 2hr 45mins.

Similarly, by scale-down response time, we refer to the time taken for the VPA’ CPU target to respond to a step decrease in CPU usage. The scale-down response time for the sample workload and default parameters of the VPA recommender is ~3 days and is shown in the graph below.
Picture
The key configuration parameter to the VPA-recommender that determines this response time is the cpu-histogram-decay-half-life.This value is the time duration after which the weight of each CPU/memory observation in the calculation of the target is halved. So the smaller this value, the faster the response times. Typically, we want a long-enough response time such that any transient or periodically repeating peaks and valleys in CPU usage will not influence the recommended target. Its default value is set to 24 hrs and users are recommended to increase or decrease this value based on the usage patterns of their particular workload.

VPA Tracker Reports

As the final step in the VPA tracking workflow, a cluster operator can optionally set up the Prometheus Alert Manager to send a report of the final target recommendation at the end of each testing period. Alternatively, reviewing the Grafana panel over the testing period will allow the operator to identify and choose either the peak or the most recent target recommendation.   

We provide an example of using the Alert Manager to send a message to a Slack channel at the end of each testing period with the recommended VPA CPU target value here: vpa-tracker-reports. The graphic below shows a sample alert from a slack channel for workload-c.
Picture
If at the end of a few iterations of testing, the VPA recommendations work well, the cluster operator can choose to either: a) manually update the resource requests of pods or b) run VPA in auto update mode.

Luna Autoscaler and VPA

When VPA recommends resource values that exceed the cluster’s current capacity, using an intelligent cluster autoscaler, like Luna can help ensure that workloads will continue to run without any interruptions and without any manual intervention to add cluster capacity. Similarly, when VPA recommends target values that would result in some cluster nodes being under-utilized, Luna can detect this and scale-down the appropriate nodes. This helps keep cluster operation costs in check.

If you are interested in using VPA with Luna, please download our free trial version from here: Luna Free Trial. And do write to us if you would like some help to get started: [email protected].


Author:

Selvi Kadirvel (VP Engineering, Elotl)


Comments are closed.

    Topic

    All
    ARM
    Autoscaling
    Deep Learning
    Disaster Recovery
    GPU Time-slicing
    Luna
    Machine Learning
    Node Management
    Nova
    Troubleshooting
    VPA

    Archives

    November 2025
    September 2025
    August 2025
    July 2025
    May 2025
    April 2025
    January 2025
    November 2024
    October 2024
    August 2024
    July 2024
    June 2024
    April 2024
    February 2024

    RSS Feed

​© 2025 Elotl, Inc.
  • Home
  • Products
    • Luna
    • Nova
  • Resources
  • Podcast
  • Company
    • Team
    • Careers
    • Contact
  • Free Trial
    • Luna Free Trial
    • Nova Free Trial