Category: Autoscaling

Right Place, Right Size: Using an Autoscaler-Aware Multi-Cluster Kubernetes Fleet Manager for ML/AI Workloads

7/11/2024

Introduction

Are you tired of juggling multiple Kubernetes clusters, desperately trying to match your ML/AI workloads to the right resources? A smart K8s fleet manager like the Elotl Nova policy-driven multi-cluster orchestrator simplifies the use of multiple clusters by presenting a single K8s endpoint for workload submission and by choosing a target cluster for the workload based on placement policies and candidate cluster available capacity. Nova is autoscaler-aware, detecting if workload clusters are running either the K8s cluster autoscaler or the Elotl Luna intelligent cluster autoscaler.

In this blog, we examine how Nova policies combined with its autoscaler-awareness can be used to achieve a variety of "right place, right size" outcomes for several common ML/AI GPU workload scenarios. When Nova and Luna team up you can:

Reduce the latency of critical ML/AI workloads by scheduling on available GPU compute.
Reduce your bill by directing experimental jobs to sunk-cost clusters.
Reduce your costs via policies that select GPUs with the desired price/performance.

Using NVIDIA GPU Time-slicing in Cloud Kubernetes Clusters with the Luna Smart Cluster Autoscaler

6/25/2024

Introduction

Kubernetes (K8s) workloads are given exclusive access to their allocated GPUs by default. With NVIDIA GPU time-slicing, GPUs can be shared among K8s workloads by interleaving their GPU use. For cloud K8s clusters running non-demanding GPU workloads, configuring NVIDIA GPU time-slicing can significantly reduce GPU costs. Note that NVIDIA GPU time-slicing is intended for non-production test/dev workloads, as it does not enforce memory and fault isolation.

Using NVIDIA GPU time-slicing in a cloud Kubernetes cluster with a cluster autoscaler (CA) that is aware of the time-slicing configuration can significantly reduce costs. A time-slice aware “smart” CA prevents initial over-allocation of instances and optimizes instance selection, and reduces the risk of exceeding quotas and capacity limits. Also, on GKE, where GPU time-slicing is expected to be configured at the control plane level, a smart CA facilitates using time-slicing on GPU resources that are dynamically allocated.

Unleashing the Power of ARM: Elevating Your Kubernetes Workloads with ARM Nodes

4/29/2024

The recent surge in ARM processor capabilities has sparked a wave of exploration beyond their traditional mobile device domain. This blog explains why you may want to consider using ARM nodes for your Kubernetes workloads. We'll identify potential benefits of leveraging ARM nodes for containerized deployments while acknowledging the inherent trade-offs and scenarios where x86-64 architectures may perform better and thus continue to be a better fit. Lastly we'll describe a seamless way to add ARM nodes to your Kubernetes clusters.

In this blog, for the sake of clarity and brevity, I will be using the term 'ARM' to refer to ARM64 or ARM 64-bit processors, while 'x86' or 'x86-64' will be used interchangeably to denote Intel or AMD 64-bit processors.

What Kubernetes Workloads Tend To Be Ideal for ARM Processors?

Inference-heavy tasks:

While the computations involved in Deep Learning training typically require GPUs for acceptable performance, DL inference is less computationally intense. Tasks that apply pre-trained models for DL regression or classification can benefit from ARM's power/performance relative to GPU or x86-64 systems. We presented data on running inference on ARM64 in our Scale20x talk.

The Benefits of Cycling Kubernetes Nodes: Optimizing Performance, Reliability, and Security

4/9/2024

Wondering whether cycling out older Kubernetes nodes periodically is a good idea? In the world of Kubernetes administration, the practice of rotating nodes often takes a backseat, even though it holds considerable advantages. While it's true that node cycling isn't universally applicable, it's worth exploring its merits for your environment. In this article, I will delve into many of the compelling reasons why considering node rotation might be beneficial for your clusters. We'll explore the advantages of node rotation in Kubernetes and how it contributes to resource optimization, fault tolerance, security, and performance improvements.

Why might someone think cycling of Kubernetes nodes is unnecessary? One reason for this could be a misconception about the stability of Kubernetes clusters. In environments where nodes rarely fail or resource usage remains relatively consistent, there might be a tendency to prioritize other tasks over node cycling. Additionally, the perceived complexity of implementing node rotation strategies, particularly in large-scale or production environments, could dissuade teams from actively considering it. Some teams might also be unaware of the potential performance gains and reliability improvements that can result from regular node cycling. However, despite these challenges or misconceptions, it's crucial to recognize that neglecting node rotation can lead to issues such as resource exhaustion, reduced fault tolerance, security vulnerabilities, difficulties upgrading to newer versions, and degraded performance over time. By acknowledging the importance of node cycling and implementing proactive strategies, administrators and DevOps teams can ensure the long-term health, resilience, and efficiency of their Kubernetes infrastructure. So, without delay, let's delve into the specifics.

Blog

Right Place, Right Size: Using an Autoscaler-Aware Multi-Cluster Kubernetes Fleet Manager for ML/AI Workloads

Introduction

Using NVIDIA GPU Time-slicing in Cloud Kubernetes Clusters with the Luna Smart Cluster Autoscaler

Introduction

Unleashing the Power of ARM: Elevating Your Kubernetes Workloads with ARM Nodes

What Kubernetes Workloads Tend To Be Ideal for ARM Processors?

Inference-heavy tasks:

The Benefits of Cycling Kubernetes Nodes: Optimizing Performance, Reliability, and Security

Topic

Archives