![]()
When managing machine learning workloads at scale, efficient GPU scheduling becomes critical. The KAI Scheduler introduces a structured approach to resource allocation by organizing jobs into queues and operating under the assumption of fixed GPU resources available within the cluster. For clarification for those not familiar with KAI terminology, the term "job" refers to a unit of scheduling work defined within KAI’s own abstraction, not to be confused with a Kubernetes Job resource (i.e., the batch/v1 kind used in Kubernetes for running finite, batch-style workloads). Each queue can be assigned limits and quotas, allowing administrators to control how resources are distributed across teams, projects, or workloads. This model ensures fair usage and predictability, but it also means that when demand exceeds supply, jobs can sit idle, waiting for resources to become available, and when supply exceeds demand, unnecessary costs are incurred.
This is where the real strength of the KAI Scheduler can shine: pairing the KAI Scheduler with Luna, an intelligent autoscaler. By combining the KAI Scheduler with an intelligent autoscaler like Luna, the system becomes highly elastic, able to dynamically add GPU nodes only when truly needed, and scale them back down to optimize efficiency. Instead of relying on a static pool of GPUs, the cluster can grow to meet active demand — but only up to what is necessary and permitted by the configured queue limits and quotas. It’s worth noting, Luna doesn't indiscriminately add nodes; it works intelligently alongside KAI, ensuring that scaling decisions respect organizational boundaries and cost controls. Beyond scaling decisions, Luna offers settings to guide GPU instance selection, adding another layer of precision. ![]()
Choosing accurate CPU and memory request values for Kubernetes workloads is a difficult endeavor. This difficulty results in application developers overprovisioning their workloads to ensure that application performance will not be affected. This can lead to increasing cloud costs and inefficient resource usage. In addition, it is also possible that workloads can be underprovisioned inadvertently. This can negatively affect application performance and potentially even lead to service disruptions.
In this blog, we describe how Kubernetes Vertical Pod Autoscaler (VPA) can be leveraged in conjunction with Luna, a powerful cluster autoscaler - to ensure that Kubernetes workloads are right-sized by VPA and the Kubernetes cluster as well as nodes are right-sized by Luna - resulting in cost-effective and performant operations. |