Elotl
  • Home
  • Platform
    • Luna
    • Nova
  • Resources
    • Blog
    • Youtube
    • Podcast
    • Meetup
  • Usecases
    • GenAI
  • Company
    • Team
    • Careers
    • Contact
    • News
  • Free Trial
    • Luna Free Trial
    • Nova Free Trial
  • Home
  • Platform
    • Luna
    • Nova
  • Resources
    • Blog
    • Youtube
    • Podcast
    • Meetup
  • Usecases
    • GenAI
  • Company
    • Team
    • Careers
    • Contact
    • News
  • Free Trial
    • Luna Free Trial
    • Nova Free Trial
Search

Blog

Reducing Deploy Time for LLM Serving on Cloud Kubernetes with Luna Smart Autoscaler

1/28/2025

 

OVERVIEW

Picture
26 minutes!  26 long minutes was our wait time in one example case for our chatbot to be operational.  Our LLM Kubernetes service runs in the cloud, and we found that deploying it from start to finish took between 13 and 26 minutes, which negatively impacted our agility and our happiness!  Spinning up the service does involve a lot of work: creating the GPU node, pulling the large container image, and downloading the files containing the LLM weights to run our model.  But we hoped we could make some simple changes to speed it up, and we did.  In this post you will learn how to do just-in-time provisioning of an LLM service in cloud Kubernetes at deployment times that won't bum you out.

We share our experience with straightforward, low-cost, off-the-shelf methods to reduce container image fetch and model download times on EKS, GKE, and AKS clusters running the Luna smart cluster autoscaler.  Our example LLM serving workload is a KubeRay RayService using vLLM to serve an open-source model downloaded from HuggingFace.  We measured deploy-time improvements of up to 60%.


Read More

EKS Auto Mode vs. Luna: Choosing the Right Scaling Strategy for Your Kubernetes Workloads

1/14/2025

 
Picture
Running Kubernetes on AWS using Elastic Kubernetes Service (EKS) offers a robust platform for container orchestration, but the challenge of managing the underlying compute infrastructure persists. This limitation can be addressed through various approaches, including the fully managed simplicity of EKS Auto Mode or the granular control offered by an intelligent Kubernetes cluster autoscaler like Luna. In this post, we’ll explore the advantages of each, helping you choose the best scaling strategy for your workloads.

Introduction

EKS Auto Mode is a fully managed solution aimed at reducing operational complexity for Kubernetes clusters on AWS. It automates essential tasks like node provisioning, scaling, and lifecycle management, offering an ideal entry point for teams new to EKS or operating simpler workloads.

In contrast, compute autoscalers like Luna offer greater flexibility and customization, allowing you to optimize your infrastructure for the demands of complex and/or resource-intensive workloads.


Read More

    Topic

    All
    ARM
    Autoscaling
    Deep Learning
    Disaster Recovery
    GPU Time-slicing
    Luna
    Machine Learning
    Node Management
    Nova
    Troubleshooting
    VPA

    Archives

    May 2025
    April 2025
    January 2025
    November 2024
    October 2024
    August 2024
    July 2024
    June 2024
    April 2024
    February 2024

    RSS Feed

​© 2025 Elotl, Inc.
  • Home
  • Platform
    • Luna
    • Nova
  • Resources
    • Blog
    • Youtube
    • Podcast
    • Meetup
  • Usecases
    • GenAI
  • Company
    • Team
    • Careers
    • Contact
    • News
  • Free Trial
    • Luna Free Trial
    • Nova Free Trial