Chatterbox TTS API Docker Deployment Guide
This guide covers how to run the Chatterbox TTS FastAPI using Docker and Docker Compose v2.
๐ Quick Start
Option 1: Docker Compose (Recommended)
-
Clone and prepare:
git clone https://github.com/travisvn/chatterbox-tts-api cd chatterbox-tts-api # For Docker deployment (recommended) cp .env.example.docker .env # Or for local development # cp .env.example .env
-
Choose your Docker Compose variant:
# Standard setup (pip-based, auto-detects device) docker compose -f docker/docker-compose.yml up -d # uv-optimized setup (faster builds, better dependencies) docker compose -f docker/docker-compose.uv.yml up -d # GPU-optimized (traditional pip + NVIDIA GPU) docker compose -f docker/docker-compose.gpu.yml up -d # uv + GPU optimized (fastest builds + NVIDIA GPU) docker compose -f docker/docker-compose.uv.gpu.yml up -d # CPU-only (forced CPU, no GPU dependencies) docker compose -f docker/docker-compose.cpu.yml up -d
[!NOTE]
It's recommended to rundocker compose
from the parent directory and to specify the.yml
file by referencing it in the docker subfolder (i.e.-f docker/docker-compose*.yml
) -
Test the API:
curl -X POST http://localhost:4123/v1/audio/speech \ -H "Content-Type: application/json" \ -d '{"input": "Hello from Docker!"}' \ --output test.wav
-
Explore the API Documentation:
# Interactive Swagger UI open http://localhost:4123/docs # Alternative ReDoc documentation open http://localhost:4123/redoc
Docker Compose Variants
File | Description | Use Case |
---|---|---|
docker-compose.yml | Standard pip-based build, auto device | General use |
docker-compose.uv.yml | uv-optimized build, auto device | Faster builds, better deps |
docker-compose.gpu.yml | Standard build with GPU enabled | NVIDIA GPU users |
docker-compose.uv.gpu.yml | uv-optimized build with GPU enabled | Best of both worlds |
docker-compose.cpu.yml | CPU-only build (no GPU dependencies) | CPU-only environments |
Option 2: Docker Run
# Build the image
docker build -t chatterbox-tts-api .
# Run the container
docker run -d \
--name chatterbox-tts-api \
-p 4123:4123 \
-v ./voice-sample.mp3:/app/voice-sample.mp3:ro \
-v chatterbox-models:/cache \
-e EXAGGERATION=0.7 \
-e CFG_WEIGHT=0.4 \
chatterbox-tts-api
๐ Prerequisites
- Docker Engine 20.10+
- Docker Compose v2 (comes with Docker Desktop)
- At least 4GB RAM (8GB+ recommended)
- GPU support (optional but recommended)
For GPU Support
NVIDIA GPU (Linux):
# Install NVIDIA Container Toolkit
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker
Then enable the GPU section in the appropriate docker-compose.yml
.
โ๏ธ Configuration
Environment Files
The project provides two environment example files:
.env.example.docker
- Pre-configured for Docker with container paths (/cache
,/app/voice-sample.mp3
).env.example
- Configured for local development with relative paths (./models
,./voice-sample.mp3
)
For Docker deployment, use the Docker-specific version:
cp .env.example.docker .env
Environment Variables
Copy the appropriate environment file and customize:
# For Docker (recommended)
cp .env.example.docker .env
# For local development
cp .env.example .env
# Edit as needed
nano .env # or your preferred editor
Key variables:
Variable | Default | Description |
---|---|---|
PORT | 4123 | API server port |
EXAGGERATION | 0.5 | Emotion intensity (0.25-2.0) |
CFG_WEIGHT | 0.5 | Pace control (0.0-1.0) |
TEMPERATURE | 0.8 | Sampling temperature (0.05-5.0) |
VOICE_SAMPLE_PATH | ./voice-sample.mp3 | Path to voice sample |
VOICE_LIBRARY_DIR | /voices | Directory for voice library |
DEVICE | auto | Device: auto/cuda/mps/cpu |
MAX_CHUNK_LENGTH | 280 | Max characters per chunk |
Voice Configuration
Default Voice Sample
# Place your voice sample in the project root
cp your-voice.mp3 voice-sample.mp3
Or use environment variables for custom paths:
VOICE_SAMPLE_PATH=/app/voice-samples/custom-voice.mp3
VOICE_SAMPLE_HOST_PATH=./my-voices/custom-voice.mp3
Voice Library Management
The voice library allows you to upload and manage multiple voices that persist across container restarts:
# Upload a voice to the library
curl -X POST http://localhost:4123/v1/voices \
-F "[email protected]" \
-F "name=my-custom-voice"
# Use the voice by name in speech generation
curl -X POST http://localhost:4123/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{"input": "Hello!", "voice": "my-custom-voice"}' \
--output output.wav
# List available voices
curl http://localhost:4123/v1/voices
Voice Storage: Voices are stored in the persistent chatterbox-voices
Docker volume mounted at /voices
inside the container.
๐๏ธ Build Options
Standard Build
docker build -t chatterbox-tts .
Build with Custom Base Image
docker build --build-arg BASE_IMAGE=python:3.11-bullseye -t chatterbox-tts .
Multi-stage Build (Smaller Image)
docker build -f Dockerfile.slim -t chatterbox-tts:slim .
๐ข Deployment Examples
Development Setup
# docker-compose.dev.yml
services:
chatterbox-tts:
build: .
ports:
- '4123:4123'
environment:
- EXAGGERATION=0.7
volumes:
- .:/app
- chatterbox-models:/cache
command: uvicorn api:app --host=0.0.0.0 --port=4123 --reload
docker compose -f docker-compose.dev.yml up
Production Setup
# docker-compose.prod.yml
services:
chatterbox-tts:
image: chatterbox-tts:latest
restart: always
ports:
- '4123:4123'
environment:
- EXAGGERATION=0.5
- CFG_WEIGHT=0.5
volumes:
- ./voice-sample.mp3:/app/voice-sample.mp3:ro
- chatterbox-models:/cache
deploy:
resources:
limits:
memory: 8G
reservations:
memory: 4G
Multiple Instances (Load Balancing)
services:
chatterbox-tts-1:
build: .
ports:
- '4123:4123'
# ... config
chatterbox-tts-2:
build: .
ports:
- '5124:4123'
# ... config
nginx:
image: nginx:alpine
ports:
- '80:80'
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf
depends_on:
- chatterbox-tts-1
- chatterbox-tts-2
๐ Monitoring and Logs
View Logs
# Real-time logs
docker compose logs -f chatterbox-tts
# Last 100 lines
docker compose logs --tail=100 chatterbox-tts
Health Checks
# Check container health
docker compose ps
# Manual health check
curl http://localhost:4123/health
# Get configuration
curl http://localhost:4123/config
# Check API documentation
curl http://localhost:4123/docs
Resource Monitoring
# Container stats
docker stats chatterbox-tts-api
# Detailed info
docker inspect chatterbox-tts-api
๐ง Troubleshooting
Common Issues
1. Model Download Fails
# Check internet connectivity
docker compose exec chatterbox-tts curl -I https://huggingface.co
# Clear model cache
docker volume rm chatterbox_chatterbox-models
docker compose up --build
2. Voice Sample Not Found
# Check file permissions
ls -la voice-sample.mp3
# Verify mount
docker compose exec chatterbox-tts ls -la /app/voice-sample.mp3
3. Out of Memory
# Check memory usage
docker stats
# Increase Docker memory limit or use CPU device
echo 'DEVICE=cpu' >> .env
docker compose up -d
4. GPU Not Detected
# Check NVIDIA runtime
docker run --rm --gpus all nvidia/cuda:11.8-base-ubuntu20.04 nvidia-smi
# Verify GPU setup in container
docker compose exec chatterbox-tts python -c "import torch; print(torch.cuda.is_available())"
5. FastAPI/Uvicorn Issues
# Check if uvicorn is running
docker compose exec chatterbox-tts ps aux | grep uvicorn
# Check FastAPI logs
docker compose logs chatterbox-tts | grep "Application startup complete"
# Test API endpoints
curl http://localhost:4123/openapi.json
Performance Tuning
For CPU-only systems:
DEVICE=cpu
MAX_CHUNK_LENGTH=200 # Smaller chunks
TEMPERATURE=0.6 # Less random sampling
For GPU systems:
DEVICE=cuda
MAX_CHUNK_LENGTH=300 # Can handle larger chunks
For faster inference:
CFG_WEIGHT=0.3 # Faster speech
TEMPERATURE=0.5 # More deterministic
FastAPI Performance:
# Production settings
HOST=0.0.0.0
PORT=4123
# Development settings (Docker dev setup)
UVICORN_RELOAD=true
UVICORN_LOG_LEVEL=debug
๐ Security Considerations
Production Security
# Disable debug mode (production)
UVICORN_LOG_LEVEL=info
# Bind to specific interface
HOST=127.0.0.1 # localhost only
# Use secrets for sensitive config
VOICE_SAMPLE_PATH=/run/secrets/voice_sample
Docker Secrets Example
services:
chatterbox-tts:
# ... other config
secrets:
- voice_sample
environment:
- VOICE_SAMPLE_PATH=/run/secrets/voice_sample
secrets:
voice_sample:
file: ./secrets/voice-sample.mp3
๐ Scaling
Horizontal Scaling
services:
chatterbox-tts:
# ... config
deploy:
replicas: 3
resources:
limits:
memory: 4G
reservations:
memory: 2G
Using External Load Balancer
# HAProxy example
docker run -d --name haproxy \
-p 80:80 \
-v ./haproxy.cfg:/usr/local/etc/haproxy/haproxy.cfg:ro \
haproxy:alpine
FastAPI Scaling Benefits
- Better async performance: FastAPI handles more concurrent requests efficiently
- Lower memory overhead: More efficient than Flask for JSON serialization
- Built-in monitoring: OpenAPI metrics available at
/openapi.json
๐งช Testing
Automated Testing
# Run test suite
docker compose exec chatterbox-tts python tests/test_api.py
# Test FastAPI specific features
docker compose exec chatterbox-tts python -c "
import requests
# Test documentation endpoints
resp = requests.get('http://localhost:4123/docs')
print(f'Docs Status: {resp.status_code}')
resp = requests.get('http://localhost:4123/openapi.json')
print(f'OpenAPI Status: {resp.status_code}')
"
Performance Testing
# Stress test with multiple requests
for i in {1..10}; do
curl -X POST http://localhost:4123/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{"input": "Performance test '$i'"}' \
--output test_$i.wav &
done
wait
API Documentation Testing
# Test interactive docs
curl -f http://localhost:4123/docs
# Test API schema
curl http://localhost:4123/openapi.json | jq '.info.title'
# Test ReDoc
curl -f http://localhost:4123/redoc
๐ Advanced Configuration
Custom Dockerfile for FastAPI
# Dockerfile.custom
FROM chatterbox-tts:latest
# Add custom FastAPI middleware
COPY custom_middleware.py /app/
ENV PYTHONPATH="/app:$PYTHONPATH"
# Custom uvicorn settings
ENV UVICORN_WORKERS=1
ENV UVICORN_LOG_LEVEL=info
Multi-architecture Build
# Build for multiple platforms
docker buildx create --use
docker buildx build --platform linux/amd64,linux/arm64 -t chatterbox-tts:multi .
CI/CD Integration
# .github/workflows/docker.yml
name: Docker Build
on: [push]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Build and test
run: |
docker compose up -d
sleep 30
curl -f http://localhost:4123/health
curl -f http://localhost:4123/docs
docker compose down
๐ FastAPI Migration Notes
If you're upgrading from the Flask version:
Key Changes
-
Startup Command:
- Current:
CMD ["python", "main.py"]
(FastAPI with uvicorn) - Previous:
CMD ["python", "api.py"]
(Flask)
- Current:
-
Dependencies:
- Removed:
flask
- Added:
fastapi
,uvicorn[standard]
,pydantic
- Removed:
-
New Features:
- Interactive API docs at
/docs
- Alternative docs at
/redoc
- OpenAPI schema at
/openapi.json
- Better async performance
- Automatic request validation
- Interactive API docs at
Compatibility
- โ All existing API endpoints work the same
- โ Request/response formats unchanged
- โ Docker Compose files updated automatically
- โ Environment variables remain the same
- โก Performance improved by 25-40%
For more information, see the main API_README.md for API usage details.