Ollama Docker Setup

Welcome to the Ollama Docker Setup documentation! This guide will help you set up and run Ollama with FastAPI wrapper and Caddy reverse proxy using Docker Compose.

Services Overview

The setup consists of three main services:

Ollama Service

The core service providing LLM functionality:

  • Based on ollama/ollama:latest image

  • GPU support enabled

  • Runs on port 11434

  • Configurable through environment variables:

    • NVIDIA_VISIBLE_DEVICES: Controls GPU visibility (default: all)

    • OLLAMA_CONCURRENT_REQUESTS: Number of concurrent requests (default: 1)

    • OLLAMA_QUEUE_ENABLED: Queue system status (default: true)

    • OLLAMA_CONTEXT_LENGTH: Context length for models (default: 8192)

FastAPI Wrapper

A custom service providing API interface:

  • Built using custom Dockerfile.wrapper

  • Runs on port 5000

  • Environment variables:

    • PYTHONUNBUFFERED: Set to 1 for unbuffered output

    • SESSION_API_KEY: Optional API key for session management

Caddy Service

Reverse proxy service:

  • Built using custom Dockerfile.caddy

  • Runs on port 3334 (configurable)

  • Environment variables:

    • PUBLIC_ACCESS_PORT: Port configuration (default: 3334)

Installation

  1. Clone the repository:

git clone https://github.com/ClinicianFOCUS/local-llm-container.git
cd local-llm-container
  1. Launch the services:

docker-compose up -d

Using the Services

Launching Models

You can launch models using either the CLI or API interface.

CLI Method

  1. Connect to the Ollama container:

docker exec -it ollama-service bash
  1. Pull your desired model:

ollama pull gemma2:2b-instruct-q8_0
  1. Run the model:

ollama run gemma2:2b-instruct-q8_0

API Method

  1. Pull a model via API:

curl -X POST http://localhost:3334/api/pull \
     -H "Content-Type: application/json" \
     -d '{"name": "gemma2:2b-instruct-q8_0"}'
  1. Generate with the model:

curl -X POST http://localhost:3334/api/generate \
     -H "Content-Type: application/json" \
     -d '{
           "model": "gemma2:2b-instruct-q8_0",
           "prompt": "Your prompt here"
         }'

Configuration

Environment Variables

Variable

Default

Description

NVIDIA_VISIBLE_DEVICES

all

GPU devices available to Ollama

OLLAMA_CONCURRENT_REQUESTS

1

Maximum concurrent requests

OLLAMA_QUEUE_ENABLED

true

Enable/disable request queue

SESSION_API_KEY

API key for FastAPI wrapper

PUBLIC_ACCESS_PORT

3334

External port for Caddy

Setting Environment Variables

Windows:

$env:MODEL_NAME='/models/you_models_folder'

Linux:

export MODEL_NAME /models/you_models_folder

Accessing the Services

Access the LLM API through the Caddy reverse proxy:

  • API Endpoint: https://localhost:3334/api/

  • API Documentation: Ollama API Docs

Resources

Python Modules

Below are the core Python modules used in this project.

License

This project is licensed under the AGPL-3.0 License - see the LICENSE file for details.