Why Helix + Luna? Helix allows companies seeking to leverage LLMs while retaining complete control over data and infrastructure. By utilizing Helix, organizations can connect their data—either locally or through APIs—to powerful AI models without transferring sensitive information outside of their ecosystem. Helix’s solution empowers companies to deploy open-source LLMs on their own resources, including cloud-based Kubernetes (K8s) clusters. This approach provides the scalability and resilience of cloud infrastructure with the privacy and control of on-premises deployment. Designed to meet the needs of modern enterprises, Helix enables robust AI integration, whether for enhancing customer interactions, streamlining internal workflows, or extracting valuable insights from vast data sets. Elotl Luna is a smart Kubernetes cluster autoscaler that runs on the 4 major K8s cloud platforms, i.e., AWS EKS, GCP GKE, Azure AKS, and Oracle OKE. It adds and removes right-sized compute instances from cloud Kubernetes clusters as needed, thereby reducing operational complexity and preventing wasted spend. Luna is ideally suited for deploying AI/ML platforms running bursty workloads that need special expensive resources such as GPU. Combining Helix with Luna in a cloud Kubernetes cluster adds dynamic resource management to Helix, allowing compute instances to be allocated on demand to handle the Helix workload, and later deallocated when no longer needed. This flexible scaling improves efficiency and reduces costs, particularly important when expensive cloud GPU resources are used. Helix + Luna Demo This video shows a demonstration of the combination of Helix and Luna in action. In this demo, Helix was installed on a GKE cluster initially composed of 3 e2-medium CPU instances, to run Helix and Luna, and 1 g2-standard-16 L4 GPU instance with 150 GB disk, for the LLM model, using these instructions. The Luna free trial version was used, with its gcp.diskSizeGb option set to 150. After setup, the my-helix-runner deployment was edited to set its replicas to 0 and its pod template to include the Luna management label elotl-luna=true and instance type selector annotation node.elotl.co/instance-type-regexp: g2-standard-16. Next, the statically-allocated g2-standard-16 node was removed from the cluster, since Luna would be handling allocating GPU nodes in response to scaling the Helix runner replicas. Then the command kubectl scale --replicas=1 deployment.apps/my-helix-runner was used to set the number of replicas to 1. In response, Luna added a new node to the K8s cluster. Note that any further changes in the Helix replicas count would trigger corresponding Luna node add or delete operations. Try Helix + Luna! We want you to benefit from the power of Helix to handle your GenAI workloads in your cloud K8s cluster along with the power of Luna to right-size the resources in your cluster. We plan to hold a workshop on doing this in the near future. Please reach out to Tamao at [email protected] if you'd like to attend or if you’d like to get started on this in the meantime. Authors: Anne Holler (Elotl), Chris Sterry (Helix), Luke Marsden (Helix)
0 Comments
Leave a Reply. |