Elotl
  • Home
  • Products
    • Luna
    • Nova
  • Resources
  • Podcast
  • Company
    • Team
    • Careers
    • Contact
  • Free Trial
    • Luna Free Trial
    • Nova Free Trial
  • Home
  • Products
    • Luna
    • Nova
  • Resources
  • Podcast
  • Company
    • Team
    • Careers
    • Contact
  • Free Trial
    • Luna Free Trial
    • Nova Free Trial
Search

Thrifty-Nova: Cost-Ordered AI Workload Placement for Multi-Cluster K8s with Autoscaled Cloud Clusters

11/18/2025

 

ABSTRACT

Picture
In a multi-cluster Kubernetes (K8s) environment, when there are insufficient statically-allocated free cluster resources to schedule a workload, an autoscaled cloud cluster can be used to obtain the resources needed to run the workload.  Selecting among your autoscaled cloud clusters the one that can obtain those resources at the lowest estimated price is desirable, particularly for AI workloads requiring GPUs, since cloud GPU supply can be limited and costs can be high and can vary greatly across vendors.

In this blog, we present Thrifty-Nova, a tool for performing cost-ordered workload placement on autoscaled cloud clusters.  Thrifty-Nova leverages the Nova fleet manager's policy-driven multi-cluster scheduling and the Luna Smart cluster autoscaler's node cost estimate feature to create a Nova placement policy that is customized to the workload with respect to relevant cloud resource availability and price.  We give several examples of Thrifty-Nova usage that show the value of automating workload cluster selection in cost-order priority, given the impact of workload configuration and dynamic resource availability on successful placement.

INTRODUCTION

Nova manages a multi-cluster multi-cloud K8s fleet, scheduling K8s workloads on target clusters in accordance with scheduling policies and free capacity, as shown in Figure 1.  Nova handles a variety of use-cases, including workload placement for resource availability or quality as presented here, with optional cross-cluster placement as demonstrated, e.g., using Cilium Cluster Mesh stretched networking as covered in this three blog series (blog1, blog2, blog3); priority-based cluster selection allowing preferential workload placement on on-premise or reserved clusters as described here; duplicate workload placement for common tooling or service continuity as discussed here, and workload migration for cluster maintenance or upgrade as illustrated here.
Picture
Figure 1: Nova Multi-Cluster Fleet Manager
Nova interoperates with cloud cluster autoscalers, including the K8s Cluster Autoscaler and the Luna Smart cluster autoscaler.  If no workload cluster that meets a schedule group's policy has sufficient free capacity for the group, Nova places the group on an autoscaled cluster that meets the policy, with the expectation that the autoscaler will add the needed capacity, as discussed here.  Luna was recently updated to provide node cost estimation for pods.  As described here, for Luna-managed pods whose scheduling readiness is blocked by the nodecostestimate K8s scheduling gate, Luna reports a pod event that indicates the node type it would allocate were the pod schedulable, with the type's estimated hourly compute cost.  Thrifty-Nova, leveraging the capabilities of Nova and Luna, dynamically creates a Nova cluster-priority group policy to have Nova dynamically select the cluster to run a workload with the lowest estimated price.

THRIFTY-NOVA OPERATION

Given a workload to be run at the lowest price, Thrifty-Nova determines the per-cluster workload cost estimates using Nova and Luna.  Thrifty-Nova then creates a Nova policy for cost-ordered placement and deploys the workload using that policy.

To determine the per-cluster workload cost estimates using Nova and Luna, Thrifty-Nova does the following:

  • Deploys a nodecostestimate schedule-gated version of the workload using a Nova spread/duplicate policy.
  • Gathers NodeCostEstimate events for the workload pods running on Luna-enabled clusters and sums them.
  • Treats statically-allocated clusters as 0 cost and autoscaled clusters not reporting estimates as max cost.
  • Undeploys the schedule-gated version of the workload and the associated spread/duplicate policy.

Note that the Luna NodeCostEstimate event will indicate if Luna would not currently expect to obtain a pod's needed resources, e.g., due to stock-out or quota backoffs; Thrifty-Nova treats any such clusters as having max cost.  Also note that when Luna estimates the cost of a node to host a pod, it does so based on the information it has at that point.  When Luna actually allocates a node for the pod, it may allocate a more expensive node type (if the node type used for its estimate is not available) or a less expensive node type (if Luna considered the node type unavailable at the time of its estimate).  The cost of a node Luna will allocate for a pod can be capped by annotating the pod with node.elotl.co/instance-max-cost set to the cost maximum.

To create a Nova policy for cost-ordered placement and deploy a workload using that policy, Thrifty-Nova does the following:

  • Creates a Nova cluster-priority group policy, with the clusters specified in ascending cost order.
  • Deploys a non-schedule-gated version of the workload using that policy.

Based on the policy, the Nova control plane will gang-schedule the workload on the first cluster on which the workload appears to fit.  If the workload doesn't fit on a statically-allocated cluster, Nova will choose the first autoscaled cluster in the list.  If a Luna autoscaled cluster cannot obtain the resources to run a pod, it reports a NodeAddRequestWarning event.  Nova detects that pod event and retries the group placement on the next cluster in the priority list.  Note that Luna retries the clusters in the priority list in round-robin fashion, meaning that the Luna cluster could eventually be retried if no other cluster is able to host the workload.

The Thrifty-Nova tool script is here.  Its arguments are the path to a local try-nova repo clone, both the schedule-gated and non-gated workload yamls, the namespace to use for workload policy and deployment, and the label key and value that select workload objects for Nova group placement.  To try this out, you'll need to install the Nova control plane on a host K8s cluster and the Nova agent on each of the workload clusters; Nova installation instructions are here.  You'll also need to ensure that the namespace being used for the workload policy is available on all of the workload clusters; an example Nova spread/duplicate policy can be found here, which Nova could apply to the namespace deployment here.

THRIFTY-NOVA EXPERIMENTS

The Thrifty-Nova experiments were run using Nova v1.3.12 for the clusters listed in Table 1.  The Luna clusters used Luna v1.4.0.
Nova Cluster Role Cluster Name Cloud K8s K8s Version Location Resource Allocation
Control Plane control-plane-host4 GKE 1.33 us-central1 static
Workload static-gke GKE 1.33 us-central1 static
Workload autoscale-gke-a GKE 1.33 us-central1-a dynamic via Luna
Workload autoscale-gke-f GKE 1.33 us-central1-f dynamic via Luna
Workload autoscale-aks AKS 1.32 eastus dynamic via Luna
Workload autoscale-eks EKS 1.33 us-west-2 dynamic via Luna
Table 1: Clusters used in Thrifty-Nova Experiments
The workload for the experiments is LLM model serving via a KubeRay RayService deployment running on Nova's SkyRay platform.  SkyRay, presented here and documented here, requires Nova spread/duplicate scheduling of KubeRay to all workload clusters to which a Ray object may be placed; a simple approach is to place it on all clusters. We used KubeRay 1.4.2.

The experiments used the model microsoft/Phi-3-mini-4k-instruct, which runs efficiently on mid-tier NVIDIA GPU SKUs such as L4, A10G, A10, and L40S.  The Luna option to specify the desired GPU SKU choices was used for RayService worker pods; on Luna-enabled clusters, Luna ensured that the associated pods were placed on the lowest-cost available node types satisfying the GPU SKU constraint.  To ensure placement on the desired GPU models on the static cluster, node affinity to GPU model labels on those nodes was used. The GKE NVIDIA daemonset adds the node label cloud.google.com/gke-accelerator set to the GPU model from this list automatically; that label is used in the following nodeAffinity setting to work for both Luna and non-Luna clusters (the matchExpressions are ORed):

      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: node.elotl.co/created-by
                operator: In
                values:
                - luna
            - matchExpressions:
              - key: cloud.google.com/gke-accelerator
                operator: In
                values:
                - <GKE-model-name1>...
                - <GKE-model-nameN>
    
Note that on non-GKE K8s clusters, NVIDIA GPU Feature Discovery in the k8s-device-plugin daemonset similarly automatically sets the node label nvidia.com/gpu.product to the NVIDIA GPU product name derived from this list, so static clusters using GFD can use that key to specify the desired GPU model(s).

Experiment 1: RayService with 2 mid-tier 1-GPU workers

For Experiment 1, Thrifty-Nova was requested to place the RayService comprising a 2-CPU 16GB CPU-only head and 2 16-CPU 16GB 1-NVIDIA-GPU workers, as per the schedule-gated config here and non-gated config here.  Thrifty-Nova created a placement policy with the clusters in the priority order: static-gke, autoscale-gke-a, autoscale-eks, autoscale-aks, autoscale-gke-f, as per the cost estimates shown in Table 2.  The static-gke cluster was first with 0 cost, since no additional cost would be incurred by placing the workload on that cluster.  The autoscale-gke-f cluster was last at max cost, because us-central1-f did not have any capacity for the specified GPU SKUs.

When Nova ran placement with the created policy, static-gke had 2 1-GPU L4 nodes allocated and available, and hence had sufficient resources for the workload, so that placement worked.
Cluster Name Est. Workload Cost ($/hr) Head Node Type (Est. Cost) Worker Node(s) Type (Est. Cost) Cluster Selection Status
static-gke 0 N/A N/A Selected
autoscale-gke-a 3.649 e2-highmem-4 (0.181) 2x g2-standard-32 (1.734)
autoscale-eks 4.254 r5a.xlarge (0.226) 2x g6.8xlarge (2.014)
autoscale-aks 6.626 Standard_E4as_v5 (0.226) 2x Standard_NV36ads_A10_v5 (3.200)
autoscale-gke-f max e2-highmem-4 (0.181) No NVIDIA GPUs for requested SKUs
Table 2: Per Cluster Estimated Workload Cost for Experiment 1

Experiment 2: RayService with 2 mid-tier 2-GPU workers

For Experiment 2, the workload was specified to have 2 2-GPU workers rather than 2 1-GPU workers, with the schedule-gated config here and non-gated config here. Thrifty-Nova again created a placement policy that specified the clusters in the order: static-gke, autoscale-gke-a, autoscale-eks, autoscale-aks, autoscale-gke-f, as per the cost estimates shown in Table 3.

When Nova ran placement with the created policy, static-gke did not have any available 2-GPU resources, so Nova next attempted to place the workload on autoscale-gke-a.  If Nova placement was run during off-peak hours, Luna was able to scale up autoscale-gke-a, so Nova placement there was successful.  However, if Nova placement was run during peak hours, Luna encountered stock-out for all of the candidate GPU instances in that cluster, and Nova then tried placement of the workload on autoscale-eks, where Luna was able to allocate the resources.
Cluster Name Est. Workload Cost ($/hr) Head Node Type (Est. Cost) Worker Node(s) Type (Est. Cost) Cluster Selection Status
static-gke 0 N/A N/A Insufficient 2-gpu resources
autoscale-gke-a 4.182 e2-highmem-4 (0.181) 2x g2-standard-24 (2.001) Selected during off-peak; Stock out during peak
autoscale-eks 4.828 r5a.xlarge (0.226) 1x g6.12xlarge (4.602) Selected during peak
autoscale-aks 13.266 Standard_E4as_v5 (0.226) 2x Standard_NV72ads_A10_v5 (6.520)
autoscale-gke-f max e2-highmem-4 (0.181) No NVIDIA GPUs for requested SKUs
Table 3: Per Cluster Estimated Workload Cost for Experiment 2

Experiment 3: RayService with 2 A100 1-GPU workers

For Experiment 3, 2 1-GPU workers were specified to use the A100 GPU SKU rather than one of the mid-tier GPU SKUs previously listed, with the schedule-gated config here and non-gated config here.  In this case, Thrifty-Nova created a placement policy that specified the clusters in the order: static-gke, autoscale-aks, autoscale-gke-a, autoscale-gke-f, autoscale-eks, as shown in Table 4.

Nova attempted placement on static-gke, autoscale-aks, autoscale-gke-a, and autoscale-gke-f, but there were no A100 instances in static-gke and Luna could not allocate A100-enabled instances on the AKS and GKE autoscaled clusters due to our accounts on those clouds having insufficient A100 quota.  Nova next attempted placement of the workload to autoscale-eks, where Luna was able to allocate the resources.
Cluster Name Est. Workload Cost ($/hr) Head Node Type (Est. Cost) Worker Node(s) Type (Est. Cost) Cluster Selection Status
static-gke 0 N/A N/A Insufficient A100 resources
autoscale-aks 7.572 Standard_E4as_v5 (0.226) 2x Standard_NC24ads_A100_v4 (3.673) Insufficient A100 quota
autoscale-gke-a 14.859 e2-highmem-4 (0.181) 2x a2-highgpu-2g (7.339) Insufficient A100 quota
autoscale-gke-f 14.859 e2-highmem-4 (0.181) 2x a2-highgpu-2g (7.339) Insufficient A100 quota
autoscale-eks 22.183 r5a.xlarge (0.226) 1x p4d.24xlarge (21.958) Selected
Table 4: Per Cluster Estimated Workload Cost for Experiment 3

SUMMARY

We've presented Thrifty-Nova, a tool for performing cost-ordered workload placement on a mix of on-premise and cloud clusters managed by the Nova fleet manager, including cloud clusters running the Luna Smart autoscaler.  Thrifty-Nova uses a Nova spread/duplicate policy to estimate workload costs via the Luna Smart autoscaler node cost estimate feature, and then creates a Nova cluster-priority group policy to perform workload placement in cluster cost order.  We've shown examples of how using that policy allows the lowest-cost available resources to be allocated, leveraging the power of Nova and Luna to get the lowest cost resources while responding dynamically to capacity constraints, including cloud stock-out and quota issues.

Are you sensitive to cost and resource availability for your workloads, especially expensive AI workloads, when choosing between your on-premise, reserved, and autoscaled cloud K8s clusters?  Thrifty-Nova is available as a simple shell script that you can use with free trial versions of Nova and Luna.  We invite you to try Nova, Luna, and Thrifty-Nova, and to let us know how it goes!



Author:
Anne Holler (Chief Scientist, Elotl)


Comments are closed.

    Topic

    All
    ARM
    Autoscaling
    Deep Learning
    Disaster Recovery
    GPU Time-slicing
    Luna
    Machine Learning
    Node Management
    Nova
    Troubleshooting
    VPA

    Archives

    November 2025
    September 2025
    August 2025
    July 2025
    May 2025
    April 2025
    January 2025
    November 2024
    October 2024
    August 2024
    July 2024
    June 2024
    April 2024
    February 2024

    RSS Feed

​© 2025 Elotl, Inc.
  • Home
  • Products
    • Luna
    • Nova
  • Resources
  • Podcast
  • Company
    • Team
    • Careers
    • Contact
  • Free Trial
    • Luna Free Trial
    • Nova Free Trial