Elotl
  • Home
  • Platform
    • Luna
    • Nova
  • Resources
    • Blog
    • Youtube
    • Podcast
    • Meetup
  • Usecases
    • GenAI
  • Company
    • Team
    • Careers
    • Contact
    • News
  • Free Trial
    • Luna Free Trial
    • Nova Free Trial
  • Home
  • Platform
    • Luna
    • Nova
  • Resources
    • Blog
    • Youtube
    • Podcast
    • Meetup
  • Usecases
    • GenAI
  • Company
    • Team
    • Careers
    • Contact
    • News
  • Free Trial
    • Luna Free Trial
    • Nova Free Trial
Search

Self-hosted GenAI Pipeline

Generative AI (GenAI) Large Language Models (LLMs) with Retrieval-Augmented Generation (RAG) can deliver good results for Question/Answer applications.

Self-hosting production LLM + RAG models offers the following advantages.
  • Cost: less expensive than public APIs
  • Scaling: eliminate rate limits
  • Performance: eliminate noisy neighbors
  • Control: eliminate outage risks
  • Data Privacy: eliminate data leak risks

Let us use a sample GenAI Pipeline to learn how to self-host.

Sample GenAI Pipeline: Drug Research Q&A Service

Drug Research Q&A Service would involve below workflow steps.
Picture
  1. Data Team produces Document Data Source
  2. Document Data Source triggers Message Queue (ex: AWS SQS)
  3. Message Queue triggers KEDA job
  4. KEDA job reads input text data, converts to data embedding (ex: using LangChain)
  5. Data Embedding gets stored in VectorDB (ex: FAISS, Weaviate, AstraDB)
  6. End User Query sent to RAG Service endpoint
  7. RAG composes query response from VectorDB and LLM Model (ex: MosaicML 7B from HuggingFace)

Self-hosting Requirements
​

Infrastructure Requirements
​
​GPU price and availability sensitivity​
​
  • GPU Availability based scheduling across on-prem and cloud providers
  • LLM and Data Embedding Jobs need GPU
  • VectorDB can run on CPU or GPU
Data Requirements
​
​Data Gravity
​
  • Data ingestion jobs with PII data need to be scheduled to run at source (ex: on-prem)
  • Workloads need to dynamically migrate across cloud clusters to shadow availability of non-PII data​
Operational Requirements
​
​Separation of Concern between App and Platform teams​
  • AI workload manifests should require zero changes by the app developer
  • Workload schedule policies should be owned by Platform team​
Elotl Nova satisfies all of the above requirements!​

Pipeline blueprint
​

All components except Document Datastore and Message Queue land on Kubernetes clusters.​
Picture

  • ​Pipeline Rollout on Day 0 (green): Choose primary deploy location (ex: on-prem) for hosting KEDA, Message Queue Scaler, KEDA Scaled Job, RAG Query Service, LLM model and Service. Rollout Pipeline components into Kubernetes namespace(s) across multiple clusters using Nova.
  • Pipeline Maintenance on Day 2 (red): New data set emerges on secondary cloud. Automagically migrate subset of Pipeline components to secondary cloud using Nova.​
  • Pipeline Teardown: One step Pipeline teardown - just delete its Kubernetes namespace(s)!

Nova In Action!
​

Watch Nova schedule and dynamically reschedule Q&A Service Pipeline components across a fleet of 10 GPU+CPU Clusters spanning 2 clouds.
​


​Takeaways​

​
  • Self-hosting GenAI Pipeline offers cost, performance, and privacy gains
  • GenAI Pipeline Rollout
    • Day 0 cost-effective scheduling of Pipeline components in a hybrid-cloud environment can be automated using Nova
  • GenAI Pipeline Maintenance
    • Day 2 mobility of Pipeline components across clouds can be automated using Nova​
  • GenAI Pipeline Teardown
    • Teardown of Pipeline components can be automated using Nova by simply removing namespace(s)
  • ​Optionally, Elotl Luna can be used for dynamic resource allocation on each of the workload clusters managed by Nova

Ready to get started with Nova?

DOWNLOAD AND TRY NOVA!
READ NOVA DOCUMENTATION
SELF-HOST GENAI STACK
​© 2025 Elotl, Inc.
  • Home
  • Platform
    • Luna
    • Nova
  • Resources
    • Blog
    • Youtube
    • Podcast
    • Meetup
  • Usecases
    • GenAI
  • Company
    • Team
    • Careers
    • Contact
    • News
  • Free Trial
    • Luna Free Trial
    • Nova Free Trial