Deploy your own GenAI Chatbot with Llama 3 Locally

Chatbot with Ollama and Open webUI using Llama3 — Your Private Chatbot on DenCloud

Using tools like ChatGPT or Meta.AI in a web browser is convenient, but leveraging their APIs for business use cases is a different story.

While Public LLM providers have very advanced capabilities, the enterprise usecases are very specific which could be simply achieved by custom LLM + RAG deployment without worrying about data privery, security and ofcourse the cost.

Many enterprises have jumped on the GenAI/LLM API bandwagon, only to face skyrocketing costs as usage increases.
An on-prem or private LLM API instance could be the game-changer—drastically cutting costs while addressing privacy and data sovereignty concerns.
And here's the best part: you don’t need to hire a dedicated LLM engineer.

This article outlines just how easy it is to set up a chatbot inference service for personal or internal enterprise use, all while keeping your proprietary data safe from public LLM providers ( all using free and open source tools )

Benefits of Running Your Own ChatGPT-Style Chatbot ( LLM + RAG )

Data Privacy: Keep sensitive information in-house
Customization: Tailor AI to your industry needs
Cost Efficiency: Reduce ongoing usage fees
Scalability: Optimize resources for enterprise workloads
Compliance: Align with industry regulations
Integration: Seamlessly connect with enterprise systems
Innovation: Foster in-house AI innovation

Now that you're convinced of the benefits of running your own AI model, let's walk through how you can get started using DenCloud, our AI-powered cloud platform.

Launching a Virtual Machine with 8X Nvidia A100 80G Cards

To run your inference workloads efficiently, you can launch a Virtual Machine (VM) equipped with 8X Nvidia A100 80G cards. These powerful GPUs will accelerate your model's performance, enabling faster and more efficient processing.

Prerequisites:

Access to DenCloud Console (Dashboard - Denvr Cloud)
Your SSH key

Login to your Virtual Machine using the “Public IP”

ssh -i <your-key> ubuntu@<Public-IP>
# Example:
ssh -i dencloud-hou1.pem ubuntu@<Public-IP>

Update Ubuntu 22 LTS Package:

Once logged in, update the package lists to ensure you have the latest versions:

sudo apt update -y && sudo apt upgrade -y

Install Ollama (Including NVIDIA Drivers)

Ollama is a robust platform designed to manage and scale large language models (LLMs) for enterprises. By using Llama 3 with Ollama, you can deploy AI models within your cloud environment, maintaining control over customization and integration.

curl -fsSL https://ollama.com/install.sh | sh

To ensure optimal performance, you'll need to install additional drivers, particularly the NVIDIA Fabric Manager for HGX/DGX systems.

For HGX/DGX Nodes with NVlink - Install Fabric Manager

sudo apt-get update sudo apt-get install -y software-properties-common sudo add-apt-repository ppa:graphics-drivers/ppa sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/7fa2af80.pub sudo sh -c 'echo "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/ /" > /etc/apt/sources.list.d/cuda.list' sudo apt-get update sudo apt-get install nvidia-fabricmanager-560 sudo systemctl enable nvidia-fabricmanager sudo systemctl start nvidia-fabricmanager

After installation, verify that NVIDIA drivers are properly installed by running:

nvidia-smi

nvidia-smi a100 cards — nvidia smi output

Configuring Ollama to Listen on All Ports

To ensure your Ollama service is accessible, modify the configuration to listen on all ports:

Edit the Ollama Service Configuration:

sudo vi /etc/systemd/system/ollama.service

Add the following configuration:

Environment="OLLAMA_HOST=0.0.0.0:11434" # This line is mandatory.

Restart the Ollama Service:

sudo systemctl daemon-reload sudo systemctl restart ollama

Download and Serve the Latest Llama 3.1 Model:

ubuntu@dencloud-a100-vm:~$ ollama run llama3.1 pulling manifest pulling 8eeb52dfb3bb... 100% ▕██████████████████████████████████████████████████████████████████████████████████▏ 4.7 GB pulling 11ce4ee3e170... 100% ▕██████████████████████████████████████████████████████████████████████████████████▏ 1.7 KB pulling 0ba8f0e314b4... 100% ▕██████████████████████████████████████████████████████████████████████████████████▏ 12 KB pulling 56bb8bd477a5... 100% ▕██████████████████████████████████████████████████████████████████████████████████▏ 96 B pulling 1a4c3c319823... 100% ▕██████████████████████████████████████████████████████████████████████████████████▏ 485 B verifying sha256 digest writing manifest removing any unused layers success >>> /bye

Ensure Ollama is Using GPU:

Check that Ollama is utilizing the GPU and not the CPU:

ubuntu@dencloud-a100-vm:~$ ollama ps NAME ID SIZE PROCESSOR UNTIL llama3.1:latest 91ab477bec9d 6.7 GB 100% GPU 4 minutes from now

The output should confirm that the model is running on the GPU, ensuring optimal performance.

Install Docker to Run the Chat UI Frontend

Finally, install Docker to set up the Chat UI frontend, allowing you to interact with your ChatGPT-style chatbot.

Installation Steps:

sudo apt install -y apt-transport-https ca-certificates curl software-properties-common curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null sudo apt update sudo apt install docker-ce -y

Run the Chat UI Frontend:

sudo docker run -d --network=host -v open-webui:/app/backend/data -e OLLAMA_BASE_URL=http://127.0.0.1:11434 --name open-webui --restart always ghcr.io/open-webui/open-webui:main

Testing Your Model

curl http://localhost:11434/api/chat -d '{
"model": "llama3.1",
"messages": [
{ "role": "user", "content": "Explain Big  bang theory to a 10 year old child" }
],
"stream": false
}'

On Chat UI interface

Conclusion

Running your own ChatGPT-style chatbot on DenCloud provides unmatched control, customization, and efficiency for your enterprise. By following this guide, you can set up a powerful AI model tailored to your needs, ensuring privacy, compliance, and innovation at every step. With the right infrastructure and tools like Ollama, you can harness the full potential of AI to drive your business forward.

Deploy Chatbot style GenAI Chatbot with Llama 3