Documentation

Chatterbox API TTS header

Chatterbox TTS API

GitHub stars GitHub forks GitHub issues GitHub last commit Discord

FastAPI-powered REST API for Chatterbox TTS, providing OpenAI-compatible text-to-speech endpoints with voice cloning capabilities and additional features on top of the chatterbox-tts base package.

Features

๐Ÿš€ OpenAI-Compatible API - Drop-in replacement for OpenAI's TTS API
โšก FastAPI Performance - High-performance async API with automatic documentation
๐ŸŽจ React Frontend - Includes an optional, ready-to-use web interface
๐ŸŽญ Voice Cloning - Use your own voice samples for personalized speech
๐ŸŽค Voice Library Management - Upload, manage, and use custom voices by name
๐Ÿ“ Smart Text Processing - Automatic chunking for long texts
๐Ÿ“Š Real-time Status - Monitor TTS progress, statistics, and request history
๐Ÿณ Docker Ready - Full containerization with persistent voice storage
โš™๏ธ Configurable - Extensive environment variable configuration
๐ŸŽ›๏ธ Parameter Control - Real-time adjustment of speech characteristics
๐Ÿ“š Auto Documentation - Interactive API docs at /docs and /redoc
๐Ÿ”ง Type Safety - Full Pydantic validation for requests and responses
๐Ÿง  Memory Management - Advanced memory monitoring and automatic cleanup

โšก๏ธ Quick Start

git clone https://github.com/travisvn/chatterbox-tts-api
cd chatterbox-tts-api
uv sync
uv run main.py
TIP

uv installed with curl -LsSf https://astral.sh/uv/install.sh | sh

Local Installation with Python ๐Ÿ

# Clone the repository
git clone https://github.com/travisvn/chatterbox-tts-api
cd chatterbox-tts-api

# Install uv if you haven't already
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install dependencies with uv (automatically creates venv)
uv sync

# Copy and customize environment variables
cp .env.example .env

# Start the API with FastAPI
uv run uvicorn app.main:app --host 0.0.0.0 --port 4123
# Or use the main script
uv run main.py

๐Ÿ’ก Why uv? Users report better compatibility with chatterbox-tts, 25-40% faster installs, and superior dependency resolution. See migration guide โ†’

Option B: Using pip (Traditional)

# Clone the repository
git clone https://github.com/travisvn/chatterbox-tts-api
cd chatterbox-tts-api

# Setup environment โ€” using Python 3.11
python -m venv .venv
source .venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Copy and customize environment variables
cp .env.example .env

# Add your voice sample (or use the provided one)
# cp your-voice.mp3 voice-sample.mp3

# Start the API with FastAPI
uvicorn app.main:app --host 0.0.0.0 --port 4123
# Or use the main script
python main.py

Ran into issues? Check the troubleshooting section

# Clone and start with Docker Compose
git clone https://github.com/travisvn/chatterbox-tts-api
cd chatterbox-tts-api

# Use Docker-optimized environment variables
cp .env.example.docker .env  # Docker-specific paths, ready to use
# Or: cp .env.example .env    # Local development paths, needs customization

# Choose your deployment method:

# API Only (default)
docker compose -f docker/docker-compose.yml up -d             # Standard (pip-based)
docker compose -f docker/docker-compose.uv.yml up -d          # uv-optimized (faster builds)
docker compose -f docker/docker-compose.gpu.yml up -d         # Standard + GPU
docker compose -f docker/docker-compose.uv.gpu.yml up -d      # uv + GPU (recommended for GPU users)
docker compose -f docker/docker-compose.cpu.yml up -d         # CPU-only

# API + Frontend (add --profile frontend to any of the above)
docker compose -f docker/docker-compose.yml --profile frontend up -d             # Standard + Frontend
docker compose -f docker/docker-compose.gpu.yml --profile frontend up -d         # GPU + Frontend
docker compose -f docker/docker-compose.uv.gpu.yml --profile frontend up -d      # uv + GPU + Frontend

# Watch the logs as it initializes (the first use of TTS takes the longest)
docker logs chatterbox-tts-api -f

# Test the API
curl -X POST http://localhost:4123/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{"input": "Hello from Chatterbox TTS!"}' \
  --output test.wav
๐Ÿš€ Running with the Web UI (Full Stack)

This project includes an optional React-based web UI. Use Docker Compose profiles to easily opt in or out of the frontend:

With Docker Compose Profiles

# API only (default behavior)
docker compose -f docker/docker-compose.yml up -d

# API + Frontend + Web UI (with --profile frontend)
docker compose -f docker/docker-compose.yml --profile frontend up -d

# Or use the convenient helper script for fullstack:
python start.py fullstack

# Same pattern works with all deployment variants:
docker compose -f docker/docker-compose.gpu.yml --profile frontend up -d    # GPU + Frontend
docker compose -f docker/docker-compose.uv.yml --profile frontend up -d     # uv + Frontend
docker compose -f docker/docker-compose.cpu.yml --profile frontend up -d    # CPU + Frontend

Local Development

For local development, you can run the API and frontend separately:

# Start the API first (follow earlier instructions)
# Then run the frontend:
cd frontend && npm install && npm run dev

Click the link provided from Vite to access the web UI.

Build for Production

Build the frontend for production deployment:

cd frontend && npm install && npm run build

You can then access it directly from your local file system at /dist/index.html.

Port Configuration

  • API Only: Accessible at http://localhost:4123 (direct API access)
  • With Frontend: Web UI at http://localhost:4321, API requests routed via proxy

The frontend uses a reverse proxy to route requests, so when running with --profile frontend, the web interface will be available at http://localhost:4321 while the API runs behind the proxy.

Screenshots of Frontend (Web UI)

Chatterbox TTS API - Frontend - Dark Mode Chatterbox TTS API - Frontend - Light Mode
Chatterbox TTS API - Frontend Processing - Dark Mode Chatterbox TTS API - Frontend Processing - Light Mode

๐Ÿ–ผ๏ธ View screenshot of full frontend web UI โ€” light mode / dark mode

API Usage

Basic Text-to-Speech (Default Voice)

This endpoint works for both the API-only and full-stack setups.

curl -X POST http://localhost:4123/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{"input": "Your text here"}' \
  --output speech.wav

Using Custom Parameters (JSON)

curl -X POST http://localhost:4123/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{"input": "Dramatic speech!", "exaggeration": 1.2, "cfg_weight": 0.3, "temperature": 0.9}' \
  --output dramatic.wav

Custom Voice Upload

Upload your own voice sample for personalized speech:

curl -X POST http://localhost:4123/v1/audio/speech/upload \
  -F "input=Hello with my custom voice!" \
  -F "exaggeration=0.8" \
  -F "voice_file=@my_voice.mp3" \
  --output custom_voice_speech.wav

With Custom Parameters and Voice Upload

curl -X POST http://localhost:4123/v1/audio/speech/upload \
  -F "input=Dramatic speech!" \
  -F "exaggeration=1.2" \
  -F "cfg_weight=0.3" \
  -F "temperature=0.9" \
  -F "voice_file=@dramatic_voice.wav" \
  --output dramatic.wav

Voice Library Management

Store and manage custom voices by name for reuse across requests:

# Upload a voice to the library
curl -X POST http://localhost:4123/v1/voices \
  -F "voice_file=@my_voice.wav" \
  -F "name=my-custom-voice"

# Use the voice by name in speech generation
curl -X POST http://localhost:4123/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{"input": "Hello with my custom voice!", "voice": "my-custom-voice"}' \
  --output custom_voice_output.wav

# List all available voices
curl http://localhost:4123/v1/voices

๐Ÿ”ง Complete Voice Library Documentation โ†’

๐ŸŽต Real-time Audio Streaming

The API supports real-time audio streaming for lower latency and better user experience. Audio chunks are generated and sent as they're ready, allowing you to start playing audio before complete generation.

Quick Start

# Basic streaming
curl -X POST http://localhost:4123/v1/audio/speech/stream \
  -H "Content-Type: application/json" \
  -d '{"input": "This streams in real-time!"}' \
  --output streaming.wav

# Real-time playback
curl -X POST http://localhost:4123/v1/audio/speech/stream \
  -H "Content-Type: application/json" \
  -d '{"input": "Play as it generates!"}' \
  | ffplay -f wav -i pipe:0 -autoexit -nodisp

๐Ÿš€ Complete Streaming Documentation โ†’

For comprehensive streaming features including:

  • Advanced chunking strategies (sentence, paragraph, word, fixed)
  • Quality presets (fast, balanced, high)
  • Configurable parameters and performance tuning
  • Real-time progress monitoring
  • Python, JavaScript, and cURL examples
  • Integration patterns for different use cases

Key Benefits:

  • โšก Lower latency - Start hearing audio in 1-2 seconds
  • ๐ŸŽฏ Better UX - No waiting for complete generation
  • ๐Ÿ’พ Memory efficient - Process chunks individually
  • ๐ŸŽ›๏ธ Configurable - Choose speed vs quality trade-offs
๐Ÿ Python Examples

Default Voice (JSON)

import requests

response = requests.post(
    "http://localhost:4123/v1/audio/speech",
    json={
        "input": "Hello world!",
        "exaggeration": 0.8
    }
)

with open("output.wav", "wb") as f:
    f.write(response.content)

Upload Endpoint (Default Voice)

import requests

response = requests.post(
    "http://localhost:4123/v1/audio/speech/upload",
    data={
        "input": "Hello world!",
        "exaggeration": 0.8
    }
)

with open("output.wav", "wb") as f:
    f.write(response.content)

Custom Voice Upload

import requests

with open("my_voice.mp3", "rb") as voice_file:
    response = requests.post(
        "http://localhost:4123/v1/audio/speech/upload",
        data={
            "input": "Hello with my custom voice!",
            "exaggeration": 0.8,
            "temperature": 1.0
        },
        files={
            "voice_file": ("my_voice.mp3", voice_file, "audio/mpeg")
        }
    )

with open("custom_output.wav", "wb") as f:
    f.write(response.content)

Basic Streaming Example

import requests

# Stream audio generation in real-time
response = requests.post(
    "http://localhost:4123/v1/audio/speech/stream",
    json={
        "input": "This will stream as it's generated!",
        "exaggeration": 0.8
    },
    stream=True  # Enable streaming mode
)

with open("streaming_output.wav", "wb") as f:
    for chunk in response.iter_content(chunk_size=8192):
        if chunk:
            f.write(chunk)
            print(f"Received chunk: {len(chunk)} bytes")

๐Ÿ“š Complete Streaming Examples & Documentation โ†’

Including real-time playback, progress monitoring, custom voice uploads, and advanced integration patterns.

Voice File Requirements

Supported Formats:

  • MP3 (.mp3)
  • WAV (.wav)
  • FLAC (.flac)
  • M4A (.m4a)
  • OGG (.ogg)

Requirements:

  • Maximum file size: 10MB
  • Recommended duration: 10-30 seconds of clear speech
  • Avoid background noise for best results
  • Higher quality audio produces better voice cloning

๐ŸŽ›๏ธ Configuration

The project provides two environment example files:

  • .env.example - For local development (uses ./models, ./voice-sample.mp3)
  • .env.example.docker - For Docker deployment (uses /cache, /app/voice-sample.mp3)

Choose the appropriate one for your setup:

# For local development
cp .env.example .env

# For Docker deployment
cp .env.example.docker .env

Key environment variables (see the example files for full list):

VariableDefaultDescription
PORT4123API server port
EXAGGERATION0.5Emotion intensity (0.25-2.0)
CFG_WEIGHT0.5Pace control (0.0-1.0)
TEMPERATURE0.8Sampling randomness (0.05-5.0)
VOICE_SAMPLE_PATH./voice-sample.mp3Voice sample for cloning
DEVICEautoDevice (auto/cuda/mps/cpu)
๐ŸŽญ Voice Cloning

Replace the default voice sample:

# Replace the default voice sample
cp your-voice.mp3 voice-sample.mp3

# Or set a custom path
echo "VOICE_SAMPLE_PATH=/path/to/your/voice.mp3" >> .env

For best results:

  • Use 10-30 seconds of clear speech
  • Avoid background noise
  • Prefer WAV or high-quality MP3
๐Ÿณ Docker Deployment

Development

docker compose -f docker/docker-compose.yml up

Production

# Create production environment
cp .env.example.docker .env
nano .env  # Set production values

# Deploy
docker compose -f docker/docker-compose.yml up -d

With GPU Support

# Use GPU-enabled compose file
# Ensure NVIDIA Container Toolkit is installed
docker compose -f docker/docker-compose.gpu.yml up -d
๐Ÿ“š API Reference

API Endpoints

EndpointMethodDescription
/audio/speechPOSTGenerate speech from text (complete)
/audio/speech/uploadPOSTGenerate speech with voice upload
/audio/speech/streamPOSTStream speech generation (docs)
/audio/speech/stream/uploadPOSTStream speech with voice upload (docs)
/healthGETHealth check and status
/configGETCurrent configuration
/v1/modelsGETAvailable models (OpenAI compat)
/statusGETTTS processing status & progress
/status/progressGETReal-time progress (lightweight)
/status/statisticsGETProcessing statistics
/status/historyGETRecent request history
/infoGETComplete API information
/docsGETInteractive API documentation
/redocGETAlternative API documentation

Parameters Reference

Speech Generation Parameters

Exaggeration (0.25-2.0)

  • 0.3-0.4: Professional, neutral
  • 0.5: Default balanced
  • 0.7-0.8: More expressive
  • 1.0+: Very dramatic

CFG Weight (0.0-1.0)

  • 0.2-0.3: Faster speech
  • 0.5: Default pace
  • 0.7-0.8: Slower, deliberate

Temperature (0.05-5.0)

  • 0.4-0.6: More consistent
  • 0.8: Default balance
  • 1.0+: More creative/random
๐Ÿง  Memory Management

The API includes advanced memory management to prevent memory leaks and optimize performance:

Memory Management Features

  • Automatic Cleanup: Periodic garbage collection and tensor cleanup
  • CUDA Memory Management: Automatic GPU cache clearing
  • Memory Monitoring: Real-time memory usage tracking
  • Manual Controls: API endpoints for manual cleanup operations

Memory Configuration

VariableDefaultDescription
MEMORY_CLEANUP_INTERVAL5Cleanup memory every N requests
CUDA_CACHE_CLEAR_INTERVAL3Clear CUDA cache every N requests
ENABLE_MEMORY_MONITORINGtrueEnable detailed memory logging

Memory Monitoring Endpoints

# Get memory status
curl http://localhost:4123/memory

# Trigger manual cleanup
curl "http://localhost:4123/memory?cleanup=true&force_cuda_clear=true"

# Reset memory tracking (with confirmation)
curl -X POST "http://localhost:4123/memory/reset?confirm=true"

Real-time Status Tracking

Monitor TTS processing in real-time:

# Check current processing status
curl "http://localhost:4123/v1/status/progress"

# Get detailed status with memory and stats
curl "http://localhost:4123/v1/status?include_memory=true&include_stats=true"

# View processing statistics
curl "http://localhost:4123/v1/status/statistics"

# Check request history
curl "http://localhost:4123/v1/status/history?limit=5"

# Get comprehensive API information
curl "http://localhost:4123/info"

Status Response Example:

{
  "is_processing": true,
  "status": "generating_audio",
  "current_step": "Generating audio for chunk 2/4",
  "current_chunk": 2,
  "total_chunks": 4,
  "progress_percentage": 50.0,
  "duration_seconds": 2.5,
  "text_preview": "Your text being processed..."
}

See Status API Documentation for complete details.

Memory Testing

Run the memory management test suite:

# Test memory patterns and cleanup
python tests/test_memory.py  # or: uv run tests/test_memory.py

# Monitor memory during testing
watch -n 1 'curl -s http://localhost:4123/memory | jq .memory_info'

Memory Optimization Tips

For High-Volume Production:

MEMORY_CLEANUP_INTERVAL=3
CUDA_CACHE_CLEAR_INTERVAL=2
ENABLE_MEMORY_MONITORING=false  # Reduce logging overhead
MAX_CHUNK_LENGTH=200             # Smaller chunks for less memory usage

For Development/Debugging:

MEMORY_CLEANUP_INTERVAL=1
CUDA_CACHE_CLEAR_INTERVAL=1
ENABLE_MEMORY_MONITORING=true

Memory Leak Prevention:

  • Tensors are automatically moved to CPU before deletion
  • Gradient tracking is disabled during inference
  • Audio chunks are cleaned up after concatenation
  • CUDA cache is periodically cleared
  • Python garbage collection is triggered regularly
๐Ÿงช Testing

Run the test script to verify the API functionality:

python tests/test_api.py

The test script will:

  • Test health check endpoint
  • Test models endpoint
  • Test API documentation endpoints (new!)
  • Generate speech for various text lengths
  • Test custom parameter validation
  • Test error handling with validation
  • Save generated audio files as test_output_*.wav
โšก Performance

FastAPI Benefits:

  • Async support: Better concurrent request handling
  • Faster serialization: JSON responses ~25% faster than Flask
  • Type validation: Pydantic models prevent invalid requests
  • Auto documentation: No manual API doc maintenance

Hardware Recommendations:

  • CPU: Works but slower, reduce chunk size for better memory usage
  • GPU: Recommended for production, significantly faster
  • Memory: 4GB minimum, 8GB+ recommended
  • Concurrency: Async support allows better multi-request handling
๐Ÿ”ง Troubleshooting

Common Issues

CUDA/CPU Compatibility Error

RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False

This happens because chatterbox-tts models require PyTorch with CUDA support, even when running on CPU. Solutions:

# Option 1: Use default setup (now includes CUDA-enabled PyTorch)
docker compose -f docker/docker-compose.yml up -d

# Option 2: Use explicit CUDA setup (traditional)
docker compose -f docker/docker-compose.gpu.yml up -d

# Option 3: Use uv + GPU setup (recommended for GPU users)
docker compose -f docker/docker-compose.uv.gpu.yml up -d

# Option 4: Use CPU-only setup (may have compatibility issues)
docker compose -f docker/docker-compose.cpu.yml up -d

# Option 5: Clear model cache and retry with CUDA-enabled setup
docker volume rm chatterbox-tts-api_chatterbox-models
docker compose -f docker/docker-compose.yml up -d --build

# Option 6: Try uv for better dependency resolution
uv sync
uv run uvicorn app.main:app --host 0.0.0.0 --port 4123

For local development, install PyTorch with CUDA support:

# With pip
pip uninstall torch torchvision torchaudio
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu124
pip install chatterbox-tts

# With uv (handles this automatically)
uv sync

Windows Users, using pip & having issues:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 --force-reinstall
pip install --force-reinstall typing_extensions

Port conflicts

# Change port
echo "PORT=4124" >> .env

GPU not detected

# Force CPU mode
echo "DEVICE=cpu" >> .env

Out of memory

# Reduce chunk size
echo "MAX_CHUNK_LENGTH=200" >> .env

Model download fails

# Clear cache and retry
rm -rf models/
uvicorn app.main:app --host 0.0.0.0 --port 4123  # or: uv run main.py

FastAPI startup issues

# Check if uvicorn is installed
uvicorn --version

# Run with verbose logging
uvicorn app.main:app --host 0.0.0.0 --port 4123 --log-level debug

# Alternative startup method
python main.py
๐Ÿ’ป Development

Project Structure

This project follows a clean, modular architecture for maintainability:

app/                     # FastAPI backend application
โ”œโ”€โ”€ __init__.py           # Main package
โ”œโ”€โ”€ config.py            # Configuration management
โ”œโ”€โ”€ main.py              # FastAPI application
โ”œโ”€โ”€ models/              # Pydantic models
โ”‚   โ”œโ”€โ”€ requests.py      # Request models
โ”‚   โ””โ”€โ”€ responses.py     # Response models
โ”œโ”€โ”€ core/                # Core functionality
โ”‚   โ”œโ”€โ”€ memory.py        # Memory management
โ”‚   โ”œโ”€โ”€ text_processing.py # Text processing utilities
โ”‚   โ””โ”€โ”€ tts_model.py     # TTS model management
โ””โ”€โ”€ api/                 # API endpoints
    โ”œโ”€โ”€ router.py        # Main router
    โ””โ”€โ”€ endpoints/       # Individual endpoint modules
        โ”œโ”€โ”€ speech.py    # TTS endpoint
        โ”œโ”€โ”€ health.py    # Health check
        โ”œโ”€โ”€ models.py    # Model listing
        โ”œโ”€โ”€ memory.py    # Memory management
        โ””โ”€โ”€ config.py    # Configuration

frontend/                # React frontend application
โ”œโ”€โ”€ src/
โ”œโ”€โ”€ Dockerfile
โ”œโ”€โ”€ nginx.conf          # Integrated proxy configuration
โ””โ”€โ”€ package.json

docker/                  # Docker files consolidated
โ”œโ”€โ”€ Dockerfile
โ”œโ”€โ”€ Dockerfile.uv       # uv-optimized image
โ”œโ”€โ”€ Dockerfile.gpu      # GPU-enabled image
โ”œโ”€โ”€ Dockerfile.cpu      # CPU-only image
โ”œโ”€โ”€ Dockerfile.uv.gpu   # uv + GPU image
โ”œโ”€โ”€ docker-compose.yml  # Standard deployment
โ”œโ”€โ”€ docker-compose.uv.yml # uv deployment
โ”œโ”€โ”€ docker-compose.gpu.yml # GPU deployment
โ”œโ”€โ”€ docker-compose.uv.gpu.yml # uv + GPU deployment
โ””โ”€โ”€ docker-compose.cpu.yml # CPU-only deployment

tests/                   # Test suite
โ”œโ”€โ”€ test_api.py         # API tests
โ””โ”€โ”€ test_memory.py      # Memory tests

main.py                  # Main entry point
start.py                 # Development helper script

Quick Start Scripts

# Development mode with auto-reload
python start.py dev

# Production mode
python start.py prod

# Full Stack mode with UI (using Docker)
python start.py fullstack

# Run tests
python start.py test

# View project structure
python start.py info

Local Development

# Install in development mode (pip)
pip install -e .

# Or with uv (basic development tools)
uv sync

# Or with test dependencies (for contributors)
uv sync --group test

# Start with auto-reload (FastAPI development)
uvicorn app.main:app --host 0.0.0.0 --port 4123 --reload

# Or use the main script
python main.py

# Or use the development helper
python start.py dev

Testing

# Run API tests
python tests/test_api.py  # or: uv run tests/test_api.py

# Run memory tests
python tests/test_memory.py

# Test specific endpoint
curl http://localhost:4123/health

# Check API documentation
curl http://localhost:4123/openapi.json

FastAPI Development Features

  • Auto-reload: Use --reload flag for development
  • Interactive docs: Visit /docs for live API testing
  • Type hints: Full IDE support with Pydantic models
  • Validation: Automatic request/response validation
  • Modular structure: Easy to extend and maintain
๐Ÿค Contributing
  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Ensure FastAPI docs are updated
  6. Submit a pull request

Support


๐Ÿ”— Integrations

Open WebUI

TIP

Customize available voices first by using the frontend at http://localhost:4321

To use Chatterbox TTS API with Open WebUI, follow these steps:

  • Open the Admin Panel and go to Settings -> Audio
  • Set your TTS Settings to match the following:
    • Text-to-Speech Engine: OpenAI
    • API Base URL: http://localhost:4123/v1 # alternatively, try http://host.docker.internal:4123/v1
    • API Key: none
    • TTS Model: tts-1 or tts-1-hd
    • TTS Voice: Name of the voice you've cloned (can also include aliases, defined in the frontend)
    • Response splitting: Paragraphs

Settings to integrate Chatterbox TTS API with Open WebUI

โžก๏ธ View the Open WebUI docs for installing Chatterbox TTS API