Docker Model Runner on Your MacBook for LLM Apps Deepseek Llama Python Streaml

In this blog, we'll walk you through setting up Docker Model Runner on your laptop to run Large Language Model (LLM) inferencing. You’ll learn how to set up Docker, run a model like ai/qwen2.5, and interact with it using a chatbot interface built with Streamlit.

Step 1: Install Docker and Enable Docker Model Runner

Install Docker:First, you need to have Docker installed on your machine. Visit Docker's official site and download Docker Desktop for your platform.

Enable Docker Model Runner:To enable the Docker Model Runner, run the following command in your terminal:

docker desktop enable model-runner

Enable TCP connection:For enabling access from your host machine, run this command:

docker desktop enable model-runner --tcp 12434

Step 2: Choose a Model Based on Your System Configuration

Choose a model based on the available RAM on your system. For instance, to remove a specific model (if required), use:

docker model run ai/qwen2.5

Step 3: Ensure Accessibility from Host Machine

Verify that your Docker container is accessible from your host machine by running this curl command:

curl http://localhost:12434/engines/llama.cpp/v1/

Step 4: Test Full LLM Inferencing Command

Run the following curl command to test if LLM inferencing is working properly. This command uses ai/qwen2.5 for text generation:

curl http://localhost:12434/engines/llama.cpp/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "ai/qwen2.5",
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful assistant."
            },
            {
                "role": "user",
                "content": "Please write 500 words about the fall of Rome."
            }
        ]
    }'

Step 5: Test from Another Container (Optional)

You can also test the API from another container using the following curl command:

curl http://model-runner.docker.internal/engines/llama.cpp/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "ai/qwen2.5",
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful assistant."
            },
            {
                "role": "user",
                "content": "Please write 500 words about the fall of Rome."
            }
        ]
    }'

Step 6: Install Streamlit and Launch Your Chatbot

Once the Docker Model Runner is successfully running, you can create a simple chatbot interface using Streamlit.

Install Streamlit:Install Streamlit on your system using pip:

pip install streamlit

Create a Chatbot Interface:Copy the following Python code and save it as Chatbot.py:

import streamlit as st
import requests
import os

# Set up Streamlit page
st.set_page_config(page_title="Easy Chatbot", page_icon="🤖")

# Define the Docker Model Runner backend URL
docker_ip = os.getenv("DOCKER_IP", "localhost")
DOCKER_API_URL = f"http://{docker_ip}:12434/engines/llama.cpp/v1/chat/completions"  # Replace with your actual Docker model runner endpoint

# Streamlit UI
st.title("💬 Your Own Chatbot 💬")
st.write("This is a simple chatbot using Streamlit for UI and Docker Model Runner for backend.")

# Create a text input for user to type a message
user_message = st.text_input("You: ", "")

# Maintain chat history in session state
if "chat_history" not in st.session_state:
    st.session_state.chat_history = []

# Function to send message to Docker Model Runner backend and parse response
def get_docker_response(message):
    # Define the payload for Docker Model Runner API
    payload = {
        "model": "ai/qwen2.5",  # Replace with your specific model if needed
        "messages": [
            {
                "role": "user",
                "content": message
            }
        ]
    }
    
    try:
        # Send POST request to the Docker Model Runner API
        print("Firing request to ", DOCKER_API_URL)
        response = requests.post(DOCKER_API_URL, json=payload)
        
        if response.status_code == 200:
            # Extract the assistant's response from the JSON structure
            return response.json()["choices"][0]["message"]["content"]
        else:
            return f"Error: {response.status_code}, {response.text}"
    except Exception as e:
        return f"Exception occurred: {str(e)}"

# Clear previous chat history when a new message is entered
if user_message:
    # Reset chat history for each new message
    st.session_state.chat_history = [{"role": "user", "content": user_message}]

    # Get response from Docker Model Runner
    bot_response = get_docker_response(user_message)
    
    # Add bot response to chat history
    st.session_state.chat_history.append({"role": "bot", "content": bot_response})

# Display chat history
for chat in st.session_state.chat_history:
    if chat["role"] == "user":
        st.markdown(f"**You:** {chat['content']}")
    else:
        st.markdown(f"**Bot:** {chat['content']}")

Run the Streamlit App:Run your chatbot by executing the following command in your

streamlit run Chatbot.py

After running the app, you should see your chatbot interface in the browser.

Streamlit chatbot with Docker model runner

Your Personal Chatbot is Ready!

Congratulations! You’ve now developed and deployed your own AI-powered chatbot using Docker Model Runner and Streamlit. You can chat with the model and get responses in real-time, all running locally on your laptop.

Complete working code for this example and other llm usecases could be found at https://github.com/becloudready/llm-examples

How to Run Docker Model Runner on Your Laptop for LLM Inferencing and Chatbot Development