Elotl
  • Home
  • Products
    • Luna
    • Nova
  • Resources
  • Podcast
  • Company
    • Team
    • Careers
    • Contact
  • Free Trial
    • Luna Free Trial
    • Nova Free Trial
  • Home
  • Products
    • Luna
    • Nova
  • Resources
  • Podcast
  • Company
    • Team
    • Careers
    • Contact
  • Free Trial
    • Luna Free Trial
    • Nova Free Trial
Search

SuperSkyRay, Part 3: Rescheduling Ray AI Apps Between K8s Clusters for RayService Cluster Upgrade/Reconfigure

11/2/2025

 

Abstract

Picture
In our blogs “SuperSkyRay, Part 1: Running Ray AI Apps Across K8s Clusters for Resource and Time Efficiency” and "SuperSkyRay, Part 2: Scaling Ray AI Apps Across K8s Clusters for No-downtime Resource Increases", we discussed SuperSkyRay’s support for running Ray apps managed by KubeRay across multiple K8s clusters linked by Cilium Cluster Mesh as well as SuperSkyRay’s non-disruptive handling of Ray apps that outgrow single-cluster placement via extending them to multi-cluster placement.

In this blog, we consider SuperSkyRay’s handling of KubeRay RayServices that outgrow the single Kubernetes (K8s) clusters hosting them due to Ray cluster upgrade or reconfigure with zero downtime.  To support zero downtime (default), the RayService keeps the current Ray cluster running while it brings up an additional Ray cluster with the new configuration; the upgrade or reconfiguration is incomplete until the new version of the Ray cluster is available. SuperSkyRay can reschedule a RayService deployed on a single cluster onto a different cluster to avoid the update stalling indefinitely when there are insufficient resources for a second RayCluster.  While this relocation involves downtime, it is appropriate when time-to-update is critical and resources are limited.

Introduction

When any field in spec.rayClusterConfig of a running RayService is changed, KubeRay by default performs a zero downtime upgrade of the Ray cluster as follows.  It keeps the current copy of the Ray cluster running to continue processing service requests while it deploys an additional version of the Ray cluster with the updates.  Once the new version is fully ready, it switches the service to using the updated Ray cluster and removes the old Ray cluster.  While this avoids service downtime, it requires that the K8s cluster hosting the RayService have sufficient resources to run two copies of the Ray cluster.  When this is not possible, the service update remains incomplete for an indefinite period of time, which is undesirable.  (RayService no-downtime upgrade can be disabled by setting ENABLE_ZERO_DOWNTIME to false, so cluster config changes do not engender any upgrade operation, which can also be undesirable.)

Read More

SuperSkyRay, Part 2: Scaling Ray AI Apps Across K8s Clusters for No-downtime Resource Increases

11/2/2025

 

Abstract

Picture
In our previous blog, SuperSkyRay, Part 1: Running Ray AI Apps Across K8s Clusters for Resource and Time Efficiency, we discussed how SuperSkyRay could be used to run Ray apps managed by KubeRay across multiple K8s clusters linked by Cilium Cluster Mesh.

In this blog, we turn our attention to how SuperSkyRay can non-disruptively handle Ray apps that outgrow their single Kubernetes (K8s) cluster placement.  SuperSkyRay can dynamically change the Ray app placement from single-cluster to cross-cluster, increasing the app’s resources without requiring any app relocation downtime.

Introduction

When SuperNova (Nova w/multi-cluster-capacity set) performs capacity-based scheduling of a K8s object group, it prefers to place the group on a single cluster if possible, since that choice is simpler in terms of management and networking than cross-cluster placement.  If a group placed on a single cluster contains an app for which the worker count is later scaled up, the result may no longer fit on that cluster, e.g., because the cluster has reached its fixed size limit, as is the case of on-premise or cloud reserved-instance clusters. When a group no longer fits on its cluster, SuperNova seeks to reschedule the group.

Read More

SuperSkyRay, Part 1: Running Ray AI Apps Across K8s Clusters for Resource and Time Efficiency

11/2/2025

 

Abstract

Picture
This blog presents SuperSkyRay, a name we gave to supporting Ray app execution via KubeRay across Kubernetes (K8s) clusters running the Cilium Cluster Mesh multi-cluster datapath.  SuperSkyRay uses the Nova K8s fleet manager to perform cross-cluster placement in accordance with KubeRay and Cluster Mesh operation.  SuperSkyRay addresses the resource and time inefficiency that occurs when resources needed for Ray apps are fragmented across K8s clusters.

Introduction

Organizations using KubeRay to run the Ray ML platform on K8s often have multiple clusters for reasons such as resource availability and cost, service continuity, geo-location, and quality of service.  SkyRay reduces the toil of managing instances of KubeRay running on a fleet of K8s clusters by providing policy-driven resource-aware scheduling of Ray apps onto K8s clusters.  However, SkyRay does not address the inefficiency that occurs if the desired scale of a Ray app exceeds the spare capacity of any single cluster in the fleet, while at the same time the fleet has sufficient idle resources fragmented across clusters. In this case, the app runs with fewer resources than desired or is delayed until enough single-cluster capacity is freed.  This inefficiency could be addressed if the Ray app could be run across multiple K8s clusters.

Read More

Avoiding AI Workload Cloud Sticker Shock

9/25/2025

 

Using the Cost Estimation Feature in the Luna K8s Smart Autoscaler to Preview and Tune AI Workload Cloud Computing Expenses

Picture
While running AI workloads on cloud K8s clusters can make resource scaling seamless, it can also lead to the sticker shock of unexpectedly high cloud bills.  And tuning AI workload resource allocation for usage increases can be unintuitive and inefficient, given the idiosyncrasies of cloud vendor node types and prices.  In this blog, we introduce the Luna Smart Cluster Autoscaler Cost Estimation feature for estimating the node cost of pods before they run.  We show how Luna's node cost estimation feature avoids AI workload sticker shock and facilitates assessing strategies for AI workload scaling.


Read More

Elotl receives investment from Cisco Investments to accelerate AI-ready Infra for Multi-Cloud Era

8/14/2025

0 Comments

 
We are excited to announce an investment from Cisco Investments to accelerate AI-ready Infra for Enterprise AI platform teams!

AI software stacks have standardized on top of Kubernetes. Elotl’s enterprise-grade battle-tested Luna provisions just-in-time right-sized compute for Kubernetes. Luna prevents wasted GPU spend for AI workloads along with simplifying operations.

Enterprise AI must meet response time SLAs before going live. Since expensive accelerators like GPUs are in short supply, waiting to source compute from a single region/datacenter/hyperscaler/neocloud would jeopardize AI business SLAs. Kubernetes platform teams need to dynamically source compute from multiple regions and cloud providers to be AI ready. This calls for a federated compute fabric spanning across on-prem datacenters, hyperscalers, and neoclouds. Elotl Nova is a policy-driven federated compute fabric that commoditizes Kubernetes clusters across regions and cloud providers.

As AI workloads scale, the need for robust, secure, and scalable networking becomes just as critical as compute. Through the acquisition of Isovalent in 2024, Cisco added the industry standard for Kubernetes networking and security, including technologies like Cilium and Tetragon, to its solutions for enterprise AI and cloud-native environments. These technologies are now foundational for enterprises running cloud-native and AI workloads on Kubernetes, providing the networking, security, and observability capabilities needed to support dynamic, distributed environments.
​

At Elotl, we’re committed to helping enterprises focus on building AI solutions while we take care of infrastructure complexity. With Cisco’s investment and the strength of its industry-leading technologies, organizations can accelerate innovation and confidently run AI across multi-cloud environments. Here is a demo of cloud bursting AI workloads from on-prem datacenter to Azure using Nova, Cilium Cluster Mesh, and Hubble:

​If you are interested in using Luna and/or Nova for your self-hosted training/inference/batch initiatives, please reach out at [email protected]

Author: Madhuri Yechuri

0 Comments

Right-Sizing Your Kubernetes Pods with a Custom VPA Tracker

7/31/2025

 
Picture
The Kubernetes Vertical Pod Autoscaler (vpa) provides near-instantaneous recommendations for CPU and memory requests for a pod. It can be used either as a read-only or as a fully automated recommender, where pods are mutated with the recommended requests. 

When a cluster operator is considering whether or not to use VPA for a specific workload, it is helpful to simply monitor and visualize both VPA recommendations along with actual resource usage over a test period, before using it in an automated fashion. In this blog, we illustrate how we can track VPA operation over such a test period using a popular open-source monitoring and visualization stack for Kubernetes (which includes Prometheus and Grafana).

Motivation for VPA tracking

Kubernetes VPA can be used in two primary update modes: Off (read-only mode) and Auto (aka Recreate). In the Off mode, the VPA custom resource provides near-instantaneous recommendations for suitable values of CPU and memory requests for pods in various types of Kubernetes resources - such as deployments, jobs, daemonsets, etc. Workload administrators can use these recommendations to manually update pod requests. Given below is an example of CPU and memory recommendations within a VPA custom resource object.

Read More

Luna now supports RKE2 clusters on AWS EC2

7/3/2025

 
Picture
The Luna cluster autoscaler can now run with SUSE's RKE2 clusters on AWS EC2 nodes.
Compared to EKS, RKE2 on EC2 offers more operational control, better customization, improved flexibility, and federation across different infrastructures: EC2, on-prem, and edge.
Luna 1.2.19 can create and manage RKE2 worker nodes, allowing you to scale your RKE2 compute resources more efficiently than with the basic Kubernetes cluster autoscaler.


Read More

Building an Elastic GPU Cluster with the KAI Scheduler and Luna Autoscaler

5/28/2025

 
Picture
When managing machine learning workloads at scale, efficient GPU scheduling becomes critical. The KAI Scheduler introduces a structured approach to resource allocation by organizing jobs into queues and operating under the assumption of fixed GPU resources available within the cluster. For clarification for those not familiar with KAI terminology, the term "job" refers to a unit of scheduling work defined within KAI’s own abstraction, not to be confused with a Kubernetes Job resource (i.e., the batch/v1 kind used in Kubernetes for running finite, batch-style workloads). Each queue can be assigned limits and quotas, allowing administrators to control how resources are distributed across teams, projects, or workloads. This model ensures fair usage and predictability, but it also means that when demand exceeds supply, jobs can sit idle, waiting for resources to become available, and when supply exceeds demand, unnecessary costs are incurred.

This is where the real strength of the KAI Scheduler can shine: pairing the KAI Scheduler with Luna, an intelligent autoscaler. By combining the KAI Scheduler with an intelligent autoscaler like Luna, the system becomes highly elastic, able to dynamically add GPU nodes only when truly needed, and scale them back down to optimize efficiency. Instead of relying on a static pool of GPUs, the cluster can grow to meet active demand — but only up to what is necessary and permitted by the configured queue limits and quotas. It’s worth noting, Luna doesn't indiscriminately add nodes; it works intelligently alongside KAI, ensuring that scaling decisions respect organizational boundaries and cost controls.  Beyond scaling decisions, Luna offers settings to guide GPU instance selection, adding another layer of precision.


Read More

Supercharge your Cluster Autoscaling with VPA

5/13/2025

 
Picture
Choosing accurate CPU and memory request values for Kubernetes workloads is a difficult endeavor. This difficulty results in application developers overprovisioning their workloads to ensure that application performance will not be affected. This can lead to increasing cloud costs and inefficient resource usage. In addition, it is also possible that workloads can be underprovisioned inadvertently. This can negatively affect application performance and potentially even lead to service disruptions.

In this blog, we describe how Kubernetes Vertical Pod Autoscaler (VPA) can be leveraged in conjunction with Luna, a powerful cluster autoscaler - to ensure that Kubernetes workloads are right-sized by VPA and the Kubernetes cluster as well as nodes are right-sized by Luna - resulting in cost-effective and performant operations.



Read More

Fun with Spot

4/24/2025

 

Experiences using Luna Smart Autoscaling of Public Cloud Kubernetes Clusters for Offline Inference using GPUs

Picture
Offline inference is well-suited to take advantage of spot GPU capacity in public clouds.  However, obtaining spot and on-demand GPU instances can be frustrating, time-consuming, and costly.  The Luna smart cluster autoscaler scales cloud Kubernetes (K8s) clusters with the least-expensive available spot and on-demand instances, in accordance with constraints that can include GPU SKU and count as well as maximum estimated hourly cost.  In this blog, we share recent experiences with offline inference on GKE, AKS, and EKS clusters using Luna.  Luna efficiently handled the toil of finding the lowest-priced available spot GPU instances, reducing estimated hourly costs by 38-50% versus an on-demand baseline and turning an often tedious task into bargain-jolt fun.

Introduction

Applications such as query/response chatbots are handled via online serving, in which each input and prompt is provided in real-time to the model running on one or more GPU workers.  Automatic instance allocation for online serving presents efficiency challenges.  Real-time response is sensitive to scaling latency during usage spikes and can be impacted by spot reclamation and replacement.  Also, peak online serving usage often overlaps with peak cloud resource usage, affecting the available capacity for GPU instances.  We've previously discussed aspects of using the Luna smart cluster autoscaler to automatically allocate instances for online serving, e.g., scaling Helix to handle ML load and reducing deploy time for new ML workers.

Read More

Reducing Deploy Time for LLM Serving on Cloud Kubernetes with Luna Smart Autoscaler

1/28/2025

 

OVERVIEW

Picture
26 minutes!  26 long minutes was our wait time in one example case for our chatbot to be operational.  Our LLM Kubernetes service runs in the cloud, and we found that deploying it from start to finish took between 13 and 26 minutes, which negatively impacted our agility and our happiness!  Spinning up the service does involve a lot of work: creating the GPU node, pulling the large container image, and downloading the files containing the LLM weights to run our model.  But we hoped we could make some simple changes to speed it up, and we did.  In this post you will learn how to do just-in-time provisioning of an LLM service in cloud Kubernetes at deployment times that won't bum you out.

We share our experience with straightforward, low-cost, off-the-shelf methods to reduce container image fetch and model download times on EKS, GKE, and AKS clusters running the Luna smart cluster autoscaler.  Our example LLM serving workload is a KubeRay RayService using vLLM to serve an open-source model downloaded from HuggingFace.  We measured deploy-time improvements of up to 60%.


Read More

EKS Auto Mode vs. Luna: Choosing the Right Scaling Strategy for Your Kubernetes Workloads

1/14/2025

 
Picture
Running Kubernetes on AWS using Elastic Kubernetes Service (EKS) offers a robust platform for container orchestration, but the challenge of managing the underlying compute infrastructure persists. This limitation can be addressed through various approaches, including the fully managed simplicity of EKS Auto Mode or the granular control offered by an intelligent Kubernetes cluster autoscaler like Luna. In this post, we’ll explore the advantages of each, helping you choose the best scaling strategy for your workloads.

Introduction

EKS Auto Mode is a fully managed solution aimed at reducing operational complexity for Kubernetes clusters on AWS. It automates essential tasks like node provisioning, scaling, and lifecycle management, offering an ideal entry point for teams new to EKS or operating simpler workloads.

In contrast, compute autoscalers like Luna offer greater flexibility and customization, allowing you to optimize your infrastructure for the demands of complex and/or resource-intensive workloads.


Read More

Helix + Luna: Efficient GenAI for Serious People

11/15/2024

0 Comments

 
Why Helix + Luna?
Helix allows companies seeking to leverage LLMs while retaining complete control over data and infrastructure. By utilizing Helix, organizations can connect their data—either locally or through APIs—to powerful AI models without transferring sensitive information outside of their ecosystem. Helix’s solution empowers companies to deploy open-source LLMs on their own resources, including cloud-based Kubernetes (K8s) clusters. This approach provides the scalability and resilience of cloud infrastructure with the privacy and control of on-premises deployment. Designed to meet the needs of modern enterprises, Helix enables robust AI integration, whether for enhancing customer interactions, streamlining internal workflows, or extracting valuable insights from vast data sets.

Elotl Luna is a smart Kubernetes cluster autoscaler that runs on the 4 major K8s cloud platforms, i.e., AWS EKS, GCP GKE, Azure AKS, and Oracle OKE.  It adds and removes right-sized compute instances from cloud Kubernetes clusters as needed, thereby reducing operational complexity and preventing wasted spend. Luna is ideally suited for deploying AI/ML platforms running bursty workloads that need special expensive resources such as GPU.
 
Combining Helix with Luna in a cloud Kubernetes cluster adds dynamic resource management to Helix, allowing compute instances to be allocated on demand to handle the Helix workload, and later deallocated when no longer needed.  This flexible scaling improves efficiency and reduces costs, particularly important when expensive cloud GPU resources are used.
​


Read More
0 Comments

Mastering Kubernetes Autoscaling: How Luna Combines Bin-Packing and Bin-Selection for Optimal Cluster Scaling Efficiency

10/3/2024

 
Picture
In the world of Kubernetes, understanding the basics of pods and nodes is important, but to truly optimize your infrastructure, you need to delve deeper. The real game-changer? Cluster Autoscalers. These tools dynamically adjust the size of your cluster, ensuring you meet workload demands without over-provisioning resources. But while many autoscalers focus solely on bin-packing, Luna takes it a step further with its innovative bin-selection feature, delivering an all-encompassing solution for workload management and cost efficiency.

In this blog, we will explore both bin-packing and bin-selection, two essential strategies for Kubernetes autoscaling. By leveraging Luna, you can maximize efficiency, minimize waste, and keep costs under control, all while handling the complexities of varying workload sizes and resource requirements. Let’s dive in!

What is Bin-Packing in Kubernetes?

Bin-packing is the default approach for optimizing pod placement in Kubernetes, maximizing resource utilization across nodes. The concept is simple: pack as many items (pods) into as few bins (nodes) as possible, maximizing resource utilization and minimizing the number of nodes required.


Read More

Luna Hot Node Mitigation: A Chill Pill to Cure Pod Performance Problems

8/21/2024

 
Picture
When nodes in a cluster become over-utilized, pod performance suffers. Avoiding or addressing hot nodes can reduce workload latency and increase throughput.  In this blog, we present two Ray Machine Learning serving experiments that show the performance benefit of Luna’s new Hot Node Mitigation (HNM) feature. With HNM enabled, Luna demonstrated a reduction in latency relative to the hot node runs: 40% in the first experiment and 70% in the second. It also increased throughput: 30% in the first and 40% in the second. We describe how the Luna smart cluster autoscaler with HNM addresses hot node performance issues by triggering the allocation and use of additional cluster resources.

INTRODUCTION

A pod's CPU and memory resource requests express its minimum resource allocations.  The Kubernetes (K8s) scheduler uses these values as constraints for placing the pod on a node, leaving the pod pending when the settings cannot be respected.  Cloud cluster autoscalers look at these values on pending pods to determine the amount of resources to add to a cluster.

A pod configured with both CPU and memory requests, and with limits equal to those requests, is in QoS class guaranteed.  A K8s cluster hosting any non-guaranteed pods runs the risk that some nodes in the cluster could become over-utilized when such pods have CPU or memory usage bursts. Bursting pods running on hot nodes can have performance problems.  A bursting pod’s attempts to use CPU above its CPU resource request can be throttled.  And its attempts to use memory above its memory resource request can cause the pod to be killed.  The K8s scheduler can worsen the situation, by continuing to schedule pods onto hot nodes.

Read More

Right Place, Right Size: Using an Autoscaler-Aware Multi-Cluster Kubernetes Fleet Manager for ML/AI Workloads

7/11/2024

 

Introduction

Picture
Are you tired of juggling multiple Kubernetes clusters, desperately trying to match your ML/AI workloads to the right resources? A smart K8s fleet manager like the Elotl Nova policy-driven multi-cluster orchestrator simplifies the use of multiple clusters by presenting a single K8s endpoint for workload submission and by choosing a target cluster for the workload based on placement policies and candidate cluster available capacity.  Nova is autoscaler-aware, detecting if workload clusters are running either the K8s cluster autoscaler or the Elotl Luna intelligent cluster autoscaler.

In this blog, we examine how Nova policies combined with its autoscaler-awareness can be used to achieve a variety of "right place, right size" outcomes for several common ML/AI GPU workload scenarios. When Nova and Luna team up you can:
  1. Reduce the latency of critical ML/AI workloads by scheduling on available GPU compute.
  2. Reduce your bill by directing experimental jobs to sunk-cost clusters.
  3. Reduce your costs via policies that select GPUs with the desired price/performance.


Read More

Using NVIDIA GPU Time-slicing in Cloud Kubernetes Clusters with the Luna Smart Cluster Autoscaler

6/25/2024

 

Introduction

Picture
Kubernetes (K8s) workloads are given exclusive access to their allocated GPUs by default.  With NVIDIA GPU time-slicing, GPUs can be shared among K8s workloads by interleaving their GPU use.  For cloud K8s clusters running non-demanding GPU workloads, configuring NVIDIA GPU time-slicing can significantly reduce GPU costs. Note that NVIDIA GPU time-slicing is intended for non-production test/dev workloads, as it does not enforce memory and fault isolation.

Using NVIDIA GPU time-slicing in a cloud Kubernetes cluster with a cluster autoscaler (CA) that is aware of the time-slicing configuration can significantly reduce costs. A time-slice aware “smart” CA prevents initial over-allocation of instances and optimizes instance selection, and reduces the risk of exceeding quotas and capacity limits.  Also, on GKE, where GPU time-slicing is expected to be configured at the control plane level, a smart CA facilitates using time-slicing on GPU resources that are dynamically allocated.



Read More

How to run the OpenTelemetry collector as a Kubernetes sidecar

6/12/2024

 
Picture
At Elotl we develop Luna, an intelligent cluster autoscaler for Kubernetes. Luna gets deployed on customers' clusters and helps scale up and down compute resources to optimize cost.

Luna operates in environments where direct access isn’t always available. To overcome the problem of diagnosis and performance monitoring we have introduced the option for customers to securely send their Luna logs and metrics to our advanced log storage appliance. This empowers us to enhance our support capabilities, providing even more effective assistance to our customers.

OpenTelemetry is fast becoming the standard for collecting metrics and logs in Kubernetes environments. We opted to run the OpenTelemetry collector as a sidecar for the Luna cluster autoscaler. It will gather and send the logs from a single pod, therefore running it as a sidecar was a perfect match.


Read More

Unleashing the Power of ARM: Elevating Your Kubernetes Workloads with ARM Nodes

4/29/2024

 
Picture
The recent surge in ARM processor capabilities has sparked a wave of exploration beyond their traditional mobile device domain. This blog explains why you may want to consider using ARM nodes for your Kubernetes workloads. We'll identify potential benefits of leveraging ARM nodes for containerized deployments while acknowledging the inherent trade-offs and scenarios where x86-64 architectures may perform better and thus continue to be a better fit. Lastly we'll describe a seamless way to add ARM nodes to your Kubernetes clusters.

In this blog, for the sake of clarity and brevity, I will be using the term 'ARM' to refer to ARM64 or ARM 64-bit processors, while 'x86' or 'x86-64' will be used interchangeably to denote Intel or AMD 64-bit processors.

What Kubernetes Workloads Tend To Be Ideal for ARM Processors?

Inference-heavy tasks:

While the computations involved in Deep Learning training typically require GPUs for acceptable performance, DL inference is less computationally intense.  Tasks that apply pre-trained models for DL regression or classification can benefit from ARM's power/performance relative to GPU or x86-64 systems. We presented data on running inference on ARM64 in our Scale20x talk.

Read More

The Benefits of Cycling Kubernetes Nodes: Optimizing Performance, Reliability, and Security

4/9/2024

 
Picture
Wondering whether cycling out older Kubernetes nodes periodically is a good idea? In the world of Kubernetes administration, the practice of rotating nodes often takes a backseat, even though it holds considerable advantages. While it's true that node cycling isn't universally applicable, it's worth exploring its merits for your environment. In this article, I will delve into many of the compelling reasons why considering node rotation might be beneficial for your clusters. We'll explore the advantages of node rotation in Kubernetes and how it contributes to resource optimization, fault tolerance, security, and performance improvements.

Why might someone think cycling of Kubernetes nodes is unnecessary? One reason for this could be a misconception about the stability of Kubernetes clusters. In environments where nodes rarely fail or resource usage remains relatively consistent, there might be a tendency to prioritize other tasks over node cycling. Additionally, the perceived complexity of implementing node rotation strategies, particularly in large-scale or production environments, could dissuade teams from actively considering it. Some teams might also be unaware of the potential performance gains and reliability improvements that can result from regular node cycling. However, despite these challenges or misconceptions, it's crucial to recognize that neglecting node rotation can lead to issues such as resource exhaustion, reduced fault tolerance, security vulnerabilities, difficulties upgrading to newer versions, and degraded performance over time. By acknowledging the importance of node cycling and implementing proactive strategies, administrators and DevOps teams can ensure the long-term health, resilience, and efficiency of their Kubernetes infrastructure. So, without delay, let's delve into the specifics.



Read More

Deep Learning Training with Ray and Ludwig using Elotl Luna

2/22/2024

 
Picture
In this brief summary blog, we delve into the intriguing realm of GPU cost savings in the cloud through the use of Luna, an Intelligent Autoscaler. If you're passionate about harnessing the power of Deep Learning (DL) while optimizing expenses, this summary is for you. Join us as we explore how innovative technologies are revolutionizing the landscape of resource management in the realm of Deep Learning. Let's embark on a journey where efficiency meets intelligence, promising both technical insights and a practical solution.

Deep Learning has and continues to transform many industries such as AI, Healthcare, Finance, Retail, E-commerce, and many others. Some of the challenges with DL include its high cost and operational overhead:
  1. Compute Costs: Deep learning models require significant computational resources, which lead to high costs, especially for complex or large-scale projects. This is even more true when the compute remains provisioned when it’s not needed.
  2. Instance Management: Managing cloud instances for training, inference, and experimentation creates operational overhead. This includes provisioning and configuring virtual machines, monitoring resource usage, and optimizing instance types for performance and cost efficiency.
  3. Infrastructure Scaling: Scaling deep learning workloads in the cloud involves dynamically adjusting compute resources to meet demand. This requires optimizing resource allocation to minimize costs while ensuring sufficient capacity.

Open-source platforms like Ray and Ludwig have broadened DL accessibility, yet DL model’s intensive GPU resource demands present financial hurdles. Addressing this, Elotl Luna emerges as a solution, streamlining compute for Kubernetes clusters without the need for manual scaling which often results in wasted spend.


Read More

A Guide to Disaster Recovery for FerretDB with Elotl Nova on Kubernetes

2/12/2024

 
Originally published on blog.ferretdb.io
Picture
Running a database without a disaster recovery process can result in loss of business continuity, resulting in revenue loss and reputation loss for a modern business.

Cloud environments provide a vast set of choices in storage, networking, compute, load-balancing and other resources to build out DR solutions for your applications. However, these building blocks need to be architected and orchestrated to build a resilient end-to-end solution. Ensuring continuous operation of the databases backing your production apps is critical to avoid losing your customers' trust.

Successful disaster recovery requires:
  • Reliable components to automate backup and recovery
  • A watertight way to identify problems
  • A list of steps to revive the database
  • Regular testing of the recovery process

This blog post shows how to automate these four aspects of disaster recovery using FerretDB, Percona PostgreSQL and Nova. Nova automates parts of the recovery process, reducing mistakes and getting your data back online faster.

Read More

Cloud GPU Allocation Got You Down? Elotl Luna to the Rescue!

2/8/2024

 
Picture
How do I efficiently run my AI or Machine Learning (ML) workloads in my Kubernetes clusters?

Operating Kubernetes clusters with GPU compute manually presents several challenges, particularly in the allocation and management of GPU resources. One significant pain point is the potential for wasted spend, as manually allocated GPUs may remain idle during periods of low workload. In dynamic or bursty clusters, predicting the optimal GPU requirements becomes challenging, leading to suboptimal resource utilization and increased costs. Additionally, manual allocation necessitates constant monitoring of GPU availability, requiring administrators be aware of the GPU availability in clusters spread across different zones or regions. Once the GPU requirements are determined for a given workload, the administrator needs to manually add nodes when demand surges and remove them during periods of inactivity.

There are many GPU types, each with different capabilities, running on different nodes types. The combination of these three factors makes manual GPU nodes management increasingly convoluted. Different workloads may require specific GPU models, leading to complexities in node allocation. Manually ensuring the correct GPU nodes for diverse workloads becomes a cumbersome task, especially when dealing with multiple applications with varying GPU preferences. This adds another layer of operational overhead, demanding detailed knowledge of GPU types, and again availability, and continuous adjustments to meet workload demands.

Luna, an intelligent node autoscaler, addresses these pain points by automating GPU node allocation based on workload demands. Luna is aware of GPU availability, as such, it can dynamically choose and allocate needed GPU nodes, eliminating the need for manual intervention. This optimizes resource utilization and reduces wasted spend by scaling GPU resources in line with the workload. Moreover, Luna can allocate specific nodes as defined by the workload requirements, ensuring precise resource allocation tailored to the application's needs. This makes Luna perfectly suited for the most complex compute jobs like AI and ML workloads.

Furthermore, Luna's core functionality includes the automatic allocation of alternative GPU nodes in cases where preferred GPUs are unavailable, bolstering its flexibility and resilience. This ensures that workloads with specific GPU preferences can seamlessly transition to available alternatives, maintaining uninterrupted operation. Controlled through annotations within the workload, users can specify cloud instance types to use or avoid, either by instance family or via regular expressions, along with desired GPU SKUs. This capability enables dynamic allocation based on GPU availability and workload demands, simplifying cluster management and promoting efficient scaling and resource utilization without the need for constant manual adjustments.

Lastly, the advantages of Luna extend beyond resource optimization and workload adaptability in a single specific cloud. When organizations leverage various cloud providers, flexibility is paramount. An intelligent autoscaler designed to support GPU management within multiple cloud providers empowers users with the freedom to choose the most suitable cloud platform for their specific needs. With Luna enterprises are not locked into a single cloud provider, offering them the agility to transition workloads seamlessly between different cloud environments based on cost-effectiveness, performance, or specific features. Currently Luna supports four cloud providers: Amazon AWS with EKS, Google Cloud with GKE, Microsoft Azure with AKS, and Oracle Cloud Infrastructure with OKE. By providing a unified and agnostic approach to GPU resource management, Luna becomes a strategic asset, enabling organizations to harness the benefits of diverse cloud ecosystems without compromising efficiency or incurring cloud vendor lock-in.

In summary, manually managing GPU compute in Kubernetes clusters poses challenges related to wasted spend, manual addition, monitoring, and removal of nodes. Luna addresses these pain points by:
  •     Streamlining GPU node allocation according to workload demands
  •     Optimizing resource utilization by dynamically choosing and allocating nodes
  •     Adapting to fluctuations in GPU availability seamlessly
  •     Unify operations over multiple clusters and cloud providers: Amazon EKS, Google GKE, Azure AKS, and Oracle OKE

Luna simplifies cluster node management, reduces operational overhead, and ensures efficient GPU resource utilization.

To delve deeper into Luna's powerful features and capabilities, explore the Luna product page for details. For step-by-step guidance, consult our Documentation. Ready to experience the seamless management of GPU workloads firsthand? Try Luna today with our free trial and witness the efficiency and flexibility it brings to your cloud environments.

Author:
Justin Willoughby (Principal Solutions Architect, Elotl)

Contributors:
Henry Precheur (Senior Staff Engineer, Elotl)
Anne Holler (Chief Scientist, Elotl)

Luna 1.0.0 is out

2/6/2024

 
Picture
The Elotl team is thrilled to announce a major milestone in our journey — the release of Luna  version 1.0.0. Luna is a Intelligent Kubernetes Cluster Autoscaler that optimizes cost, simplifies operations, and supports four public Cloud Providers: Amazon EKS, Google GKE, Microsoft AKS, and Oracle OCI.
While some might associate version 1.0.0 with potential hiccups, rest assured, this release is a testament to our commitment to excellence and stability. We’ve diligently worked to ensure that this version not only meets but exceeds expectations.

Why Luna Version 1.0.0 is a Milestone:

  • Widened Horizon: Luna has been rigorously tested and optimized, making it suitable for a broad range of applications.
  • Trusted in Production: Version 1.0.0 builds upon the rock-solid foundation of its predecessor, version 0.7.4, which has been successfully running in diverse production clusters.

Give it a try

To learn more about Luna, check out the Luna product page, you can also download the trial version of Luna, or read the documentation. 
We dedicated extensive effort to building Luna into a robust cluster autoscaler, ensuring that every dollar brings optimal value. Luna is designed to enhance the efficiency of your Kubernetes workloads and streamline the scaling operations across multiple cloud environments. We encourage you to explore Luna, especially for clusters handling substantial, dynamic, or bursty workloads.

    Topic

    All
    ARM
    Autoscaling
    Deep Learning
    Disaster Recovery
    GPU Time-slicing
    Luna
    Machine Learning
    Node Management
    Nova
    Troubleshooting
    VPA

    Archives

    November 2025
    September 2025
    August 2025
    July 2025
    May 2025
    April 2025
    January 2025
    November 2024
    October 2024
    August 2024
    July 2024
    June 2024
    April 2024
    February 2024

    RSS Feed

​© 2025 Elotl, Inc.
  • Home
  • Products
    • Luna
    • Nova
  • Resources
  • Podcast
  • Company
    • Team
    • Careers
    • Contact
  • Free Trial
    • Luna Free Trial
    • Nova Free Trial