Ollama Docker Setup
Welcome to the Ollama Docker Setup documentation! This guide will help you set up and run Ollama with FastAPI wrapper and Caddy reverse proxy using Docker Compose.
Services Overview
The setup consists of three main services:
Ollama Service
The core service providing LLM functionality:
Based on
ollama/ollama:latestimageGPU support enabled
Runs on port 11434
Configurable through environment variables:
NVIDIA_VISIBLE_DEVICES: Controls GPU visibility (default: all)OLLAMA_CONCURRENT_REQUESTS: Number of concurrent requests (default: 1)OLLAMA_QUEUE_ENABLED: Queue system status (default: true)OLLAMA_CONTEXT_LENGTH: Context length for models (default: 8192)
FastAPI Wrapper
A custom service providing API interface:
Built using custom
Dockerfile.wrapperRuns on port 5000
Environment variables:
PYTHONUNBUFFERED: Set to 1 for unbuffered outputSESSION_API_KEY: Optional API key for session management
Caddy Service
Reverse proxy service:
Built using custom
Dockerfile.caddyRuns on port 3334 (configurable)
Environment variables:
PUBLIC_ACCESS_PORT: Port configuration (default: 3334)
Installation
Clone the repository:
git clone https://github.com/ClinicianFOCUS/local-llm-container.git
cd local-llm-container
Launch the services:
docker-compose up -d
Using the Services
Launching Models
You can launch models using either the CLI or API interface.
CLI Method
Connect to the Ollama container:
docker exec -it ollama-service bash
Pull your desired model:
ollama pull gemma2:2b-instruct-q8_0
Run the model:
ollama run gemma2:2b-instruct-q8_0
API Method
Pull a model via API:
curl -X POST http://localhost:3334/api/pull \
-H "Content-Type: application/json" \
-d '{"name": "gemma2:2b-instruct-q8_0"}'
Generate with the model:
curl -X POST http://localhost:3334/api/generate \
-H "Content-Type: application/json" \
-d '{
"model": "gemma2:2b-instruct-q8_0",
"prompt": "Your prompt here"
}'
Configuration
Environment Variables
Variable |
Default |
Description |
|---|---|---|
NVIDIA_VISIBLE_DEVICES |
all |
GPU devices available to Ollama |
OLLAMA_CONCURRENT_REQUESTS |
1 |
Maximum concurrent requests |
OLLAMA_QUEUE_ENABLED |
true |
Enable/disable request queue |
SESSION_API_KEY |
API key for FastAPI wrapper |
|
PUBLIC_ACCESS_PORT |
3334 |
External port for Caddy |
Setting Environment Variables
Windows:
$env:MODEL_NAME='/models/you_models_folder'
Linux:
export MODEL_NAME /models/you_models_folder
Accessing the Services
Access the LLM API through the Caddy reverse proxy:
API Endpoint:
https://localhost:3334/api/API Documentation: Ollama API Docs
Resources
Available models can be found at the Ollama Model Library
Python Modules
Below are the core Python modules used in this project.
License
This project is licensed under the AGPL-3.0 License - see the LICENSE file for details.