Chatterbox TTS API
FastAPI-powered REST API for Chatterbox TTS, providing OpenAI-compatible text-to-speech endpoints with voice cloning capabilities and additional features on top of the chatterbox-tts
base package.
Features
๐ OpenAI-Compatible API - Drop-in replacement for OpenAI's TTS API
โก FastAPI Performance - High-performance async API with automatic documentation
๐จ React Frontend - Includes an optional, ready-to-use web interface
๐ญ Voice Cloning - Use your own voice samples for personalized speech
๐ค Voice Library Management - Upload, manage, and use custom voices by name
๐ Smart Text Processing - Automatic chunking for long texts
๐ Real-time Status - Monitor TTS progress, statistics, and request history
๐ณ Docker Ready - Full containerization with persistent voice storage
โ๏ธ Configurable - Extensive environment variable configuration
๐๏ธ Parameter Control - Real-time adjustment of speech characteristics
๐ Auto Documentation - Interactive API docs at /docs
and /redoc
๐ง Type Safety - Full Pydantic validation for requests and responses
๐ง Memory Management - Advanced memory monitoring and automatic cleanup
โก๏ธ Quick Start
git clone https://github.com/travisvn/chatterbox-tts-api
cd chatterbox-tts-api
uv sync
uv run main.py
uv installed with curl -LsSf https://astral.sh/uv/install.sh | sh
Local Installation with Python ๐
Option A: Using uv (Recommended - Faster & Better Dependencies)
# Clone the repository
git clone https://github.com/travisvn/chatterbox-tts-api
cd chatterbox-tts-api
# Install uv if you haven't already
curl -LsSf https://astral.sh/uv/install.sh | sh
# Install dependencies with uv (automatically creates venv)
uv sync
# Copy and customize environment variables
cp .env.example .env
# Start the API with FastAPI
uv run uvicorn app.main:app --host 0.0.0.0 --port 4123
# Or use the main script
uv run main.py
๐ก Why uv? Users report better compatibility with
chatterbox-tts
, 25-40% faster installs, and superior dependency resolution. See migration guide โ
Option B: Using pip (Traditional)
# Clone the repository
git clone https://github.com/travisvn/chatterbox-tts-api
cd chatterbox-tts-api
# Setup environment โ using Python 3.11
python -m venv .venv
source .venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Copy and customize environment variables
cp .env.example .env
# Add your voice sample (or use the provided one)
# cp your-voice.mp3 voice-sample.mp3
# Start the API with FastAPI
uvicorn app.main:app --host 0.0.0.0 --port 4123
# Or use the main script
python main.py
Ran into issues? Check the troubleshooting section
๐ณ Docker (Recommended)
# Clone and start with Docker Compose
git clone https://github.com/travisvn/chatterbox-tts-api
cd chatterbox-tts-api
# Use Docker-optimized environment variables
cp .env.example.docker .env # Docker-specific paths, ready to use
# Or: cp .env.example .env # Local development paths, needs customization
# Choose your deployment method:
# API Only (default)
docker compose -f docker/docker-compose.yml up -d # Standard (pip-based)
docker compose -f docker/docker-compose.uv.yml up -d # uv-optimized (faster builds)
docker compose -f docker/docker-compose.gpu.yml up -d # Standard + GPU
docker compose -f docker/docker-compose.uv.gpu.yml up -d # uv + GPU (recommended for GPU users)
docker compose -f docker/docker-compose.cpu.yml up -d # CPU-only
# API + Frontend (add --profile frontend to any of the above)
docker compose -f docker/docker-compose.yml --profile frontend up -d # Standard + Frontend
docker compose -f docker/docker-compose.gpu.yml --profile frontend up -d # GPU + Frontend
docker compose -f docker/docker-compose.uv.gpu.yml --profile frontend up -d # uv + GPU + Frontend
# Watch the logs as it initializes (the first use of TTS takes the longest)
docker logs chatterbox-tts-api -f
# Test the API
curl -X POST http://localhost:4123/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{"input": "Hello from Chatterbox TTS!"}' \
--output test.wav
๐ Running with the Web UI (Full Stack)
This project includes an optional React-based web UI. Use Docker Compose profiles to easily opt in or out of the frontend:
With Docker Compose Profiles
# API only (default behavior)
docker compose -f docker/docker-compose.yml up -d
# API + Frontend + Web UI (with --profile frontend)
docker compose -f docker/docker-compose.yml --profile frontend up -d
# Or use the convenient helper script for fullstack:
python start.py fullstack
# Same pattern works with all deployment variants:
docker compose -f docker/docker-compose.gpu.yml --profile frontend up -d # GPU + Frontend
docker compose -f docker/docker-compose.uv.yml --profile frontend up -d # uv + Frontend
docker compose -f docker/docker-compose.cpu.yml --profile frontend up -d # CPU + Frontend
Local Development
For local development, you can run the API and frontend separately:
# Start the API first (follow earlier instructions)
# Then run the frontend:
cd frontend && npm install && npm run dev
Click the link provided from Vite to access the web UI.
Build for Production
Build the frontend for production deployment:
cd frontend && npm install && npm run build
You can then access it directly from your local file system at /dist/index.html
.
Port Configuration
- API Only: Accessible at
http://localhost:4123
(direct API access) - With Frontend: Web UI at
http://localhost:4321
, API requests routed via proxy
The frontend uses a reverse proxy to route requests, so when running with --profile frontend
, the web interface will be available at http://localhost:4321
while the API runs behind the proxy.
Screenshots of Frontend (Web UI)
๐ผ๏ธ View screenshot of full frontend web UI โ light mode / dark mode
API Usage
Basic Text-to-Speech (Default Voice)
This endpoint works for both the API-only and full-stack setups.
curl -X POST http://localhost:4123/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{"input": "Your text here"}' \
--output speech.wav
Using Custom Parameters (JSON)
curl -X POST http://localhost:4123/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{"input": "Dramatic speech!", "exaggeration": 1.2, "cfg_weight": 0.3, "temperature": 0.9}' \
--output dramatic.wav
Custom Voice Upload
Upload your own voice sample for personalized speech:
curl -X POST http://localhost:4123/v1/audio/speech/upload \
-F "input=Hello with my custom voice!" \
-F "exaggeration=0.8" \
-F "voice_file=@my_voice.mp3" \
--output custom_voice_speech.wav
With Custom Parameters and Voice Upload
curl -X POST http://localhost:4123/v1/audio/speech/upload \
-F "input=Dramatic speech!" \
-F "exaggeration=1.2" \
-F "cfg_weight=0.3" \
-F "temperature=0.9" \
-F "voice_file=@dramatic_voice.wav" \
--output dramatic.wav
Voice Library Management
Store and manage custom voices by name for reuse across requests:
# Upload a voice to the library
curl -X POST http://localhost:4123/v1/voices \
-F "voice_file=@my_voice.wav" \
-F "name=my-custom-voice"
# Use the voice by name in speech generation
curl -X POST http://localhost:4123/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{"input": "Hello with my custom voice!", "voice": "my-custom-voice"}' \
--output custom_voice_output.wav
# List all available voices
curl http://localhost:4123/v1/voices
๐ง Complete Voice Library Documentation โ
๐ต Real-time Audio Streaming
The API supports real-time audio streaming for lower latency and better user experience. Audio chunks are generated and sent as they're ready, allowing you to start playing audio before complete generation.
Quick Start
# Basic streaming
curl -X POST http://localhost:4123/v1/audio/speech/stream \
-H "Content-Type: application/json" \
-d '{"input": "This streams in real-time!"}' \
--output streaming.wav
# Real-time playback
curl -X POST http://localhost:4123/v1/audio/speech/stream \
-H "Content-Type: application/json" \
-d '{"input": "Play as it generates!"}' \
| ffplay -f wav -i pipe:0 -autoexit -nodisp
๐ Complete Streaming Documentation โ
For comprehensive streaming features including:
- Advanced chunking strategies (sentence, paragraph, word, fixed)
- Quality presets (fast, balanced, high)
- Configurable parameters and performance tuning
- Real-time progress monitoring
- Python, JavaScript, and cURL examples
- Integration patterns for different use cases
Key Benefits:
- โก Lower latency - Start hearing audio in 1-2 seconds
- ๐ฏ Better UX - No waiting for complete generation
- ๐พ Memory efficient - Process chunks individually
- ๐๏ธ Configurable - Choose speed vs quality trade-offs
๐ Python Examples
Default Voice (JSON)
import requests
response = requests.post(
"http://localhost:4123/v1/audio/speech",
json={
"input": "Hello world!",
"exaggeration": 0.8
}
)
with open("output.wav", "wb") as f:
f.write(response.content)
Upload Endpoint (Default Voice)
import requests
response = requests.post(
"http://localhost:4123/v1/audio/speech/upload",
data={
"input": "Hello world!",
"exaggeration": 0.8
}
)
with open("output.wav", "wb") as f:
f.write(response.content)
Custom Voice Upload
import requests
with open("my_voice.mp3", "rb") as voice_file:
response = requests.post(
"http://localhost:4123/v1/audio/speech/upload",
data={
"input": "Hello with my custom voice!",
"exaggeration": 0.8,
"temperature": 1.0
},
files={
"voice_file": ("my_voice.mp3", voice_file, "audio/mpeg")
}
)
with open("custom_output.wav", "wb") as f:
f.write(response.content)
Basic Streaming Example
import requests
# Stream audio generation in real-time
response = requests.post(
"http://localhost:4123/v1/audio/speech/stream",
json={
"input": "This will stream as it's generated!",
"exaggeration": 0.8
},
stream=True # Enable streaming mode
)
with open("streaming_output.wav", "wb") as f:
for chunk in response.iter_content(chunk_size=8192):
if chunk:
f.write(chunk)
print(f"Received chunk: {len(chunk)} bytes")
๐ Complete Streaming Examples & Documentation โ
Including real-time playback, progress monitoring, custom voice uploads, and advanced integration patterns.
Voice File Requirements
Supported Formats:
- MP3 (.mp3)
- WAV (.wav)
- FLAC (.flac)
- M4A (.m4a)
- OGG (.ogg)
Requirements:
- Maximum file size: 10MB
- Recommended duration: 10-30 seconds of clear speech
- Avoid background noise for best results
- Higher quality audio produces better voice cloning
๐๏ธ Configuration
The project provides two environment example files:
.env.example
- For local development (uses./models
,./voice-sample.mp3
).env.example.docker
- For Docker deployment (uses/cache
,/app/voice-sample.mp3
)
Choose the appropriate one for your setup:
# For local development
cp .env.example .env
# For Docker deployment
cp .env.example.docker .env
Key environment variables (see the example files for full list):
Variable | Default | Description |
---|---|---|
PORT | 4123 | API server port |
EXAGGERATION | 0.5 | Emotion intensity (0.25-2.0) |
CFG_WEIGHT | 0.5 | Pace control (0.0-1.0) |
TEMPERATURE | 0.8 | Sampling randomness (0.05-5.0) |
VOICE_SAMPLE_PATH | ./voice-sample.mp3 | Voice sample for cloning |
DEVICE | auto | Device (auto/cuda/mps/cpu) |
๐ญ Voice Cloning
Replace the default voice sample:
# Replace the default voice sample
cp your-voice.mp3 voice-sample.mp3
# Or set a custom path
echo "VOICE_SAMPLE_PATH=/path/to/your/voice.mp3" >> .env
For best results:
- Use 10-30 seconds of clear speech
- Avoid background noise
- Prefer WAV or high-quality MP3
๐ณ Docker Deployment
Development
docker compose -f docker/docker-compose.yml up
Production
# Create production environment
cp .env.example.docker .env
nano .env # Set production values
# Deploy
docker compose -f docker/docker-compose.yml up -d
With GPU Support
# Use GPU-enabled compose file
# Ensure NVIDIA Container Toolkit is installed
docker compose -f docker/docker-compose.gpu.yml up -d
๐ API Reference
API Endpoints
Endpoint | Method | Description |
---|---|---|
/audio/speech | POST | Generate speech from text (complete) |
/audio/speech/upload | POST | Generate speech with voice upload |
/audio/speech/stream | POST | Stream speech generation (docs) |
/audio/speech/stream/upload | POST | Stream speech with voice upload (docs) |
/health | GET | Health check and status |
/config | GET | Current configuration |
/v1/models | GET | Available models (OpenAI compat) |
/status | GET | TTS processing status & progress |
/status/progress | GET | Real-time progress (lightweight) |
/status/statistics | GET | Processing statistics |
/status/history | GET | Recent request history |
/info | GET | Complete API information |
/docs | GET | Interactive API documentation |
/redoc | GET | Alternative API documentation |
Parameters Reference
Speech Generation Parameters
Exaggeration (0.25-2.0)
0.3-0.4
: Professional, neutral0.5
: Default balanced0.7-0.8
: More expressive1.0+
: Very dramatic
CFG Weight (0.0-1.0)
0.2-0.3
: Faster speech0.5
: Default pace0.7-0.8
: Slower, deliberate
Temperature (0.05-5.0)
0.4-0.6
: More consistent0.8
: Default balance1.0+
: More creative/random
๐ง Memory Management
The API includes advanced memory management to prevent memory leaks and optimize performance:
Memory Management Features
- Automatic Cleanup: Periodic garbage collection and tensor cleanup
- CUDA Memory Management: Automatic GPU cache clearing
- Memory Monitoring: Real-time memory usage tracking
- Manual Controls: API endpoints for manual cleanup operations
Memory Configuration
Variable | Default | Description |
---|---|---|
MEMORY_CLEANUP_INTERVAL | 5 | Cleanup memory every N requests |
CUDA_CACHE_CLEAR_INTERVAL | 3 | Clear CUDA cache every N requests |
ENABLE_MEMORY_MONITORING | true | Enable detailed memory logging |
Memory Monitoring Endpoints
# Get memory status
curl http://localhost:4123/memory
# Trigger manual cleanup
curl "http://localhost:4123/memory?cleanup=true&force_cuda_clear=true"
# Reset memory tracking (with confirmation)
curl -X POST "http://localhost:4123/memory/reset?confirm=true"
Real-time Status Tracking
Monitor TTS processing in real-time:
# Check current processing status
curl "http://localhost:4123/v1/status/progress"
# Get detailed status with memory and stats
curl "http://localhost:4123/v1/status?include_memory=true&include_stats=true"
# View processing statistics
curl "http://localhost:4123/v1/status/statistics"
# Check request history
curl "http://localhost:4123/v1/status/history?limit=5"
# Get comprehensive API information
curl "http://localhost:4123/info"
Status Response Example:
{
"is_processing": true,
"status": "generating_audio",
"current_step": "Generating audio for chunk 2/4",
"current_chunk": 2,
"total_chunks": 4,
"progress_percentage": 50.0,
"duration_seconds": 2.5,
"text_preview": "Your text being processed..."
}
See Status API Documentation for complete details.
Memory Testing
Run the memory management test suite:
# Test memory patterns and cleanup
python tests/test_memory.py # or: uv run tests/test_memory.py
# Monitor memory during testing
watch -n 1 'curl -s http://localhost:4123/memory | jq .memory_info'
Memory Optimization Tips
For High-Volume Production:
MEMORY_CLEANUP_INTERVAL=3
CUDA_CACHE_CLEAR_INTERVAL=2
ENABLE_MEMORY_MONITORING=false # Reduce logging overhead
MAX_CHUNK_LENGTH=200 # Smaller chunks for less memory usage
For Development/Debugging:
MEMORY_CLEANUP_INTERVAL=1
CUDA_CACHE_CLEAR_INTERVAL=1
ENABLE_MEMORY_MONITORING=true
Memory Leak Prevention:
- Tensors are automatically moved to CPU before deletion
- Gradient tracking is disabled during inference
- Audio chunks are cleaned up after concatenation
- CUDA cache is periodically cleared
- Python garbage collection is triggered regularly
๐งช Testing
Run the test script to verify the API functionality:
python tests/test_api.py
The test script will:
- Test health check endpoint
- Test models endpoint
- Test API documentation endpoints (new!)
- Generate speech for various text lengths
- Test custom parameter validation
- Test error handling with validation
- Save generated audio files as
test_output_*.wav
โก Performance
FastAPI Benefits:
- Async support: Better concurrent request handling
- Faster serialization: JSON responses ~25% faster than Flask
- Type validation: Pydantic models prevent invalid requests
- Auto documentation: No manual API doc maintenance
Hardware Recommendations:
- CPU: Works but slower, reduce chunk size for better memory usage
- GPU: Recommended for production, significantly faster
- Memory: 4GB minimum, 8GB+ recommended
- Concurrency: Async support allows better multi-request handling
๐ง Troubleshooting
Common Issues
CUDA/CPU Compatibility Error
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False
This happens because chatterbox-tts
models require PyTorch with CUDA support, even when running on CPU. Solutions:
# Option 1: Use default setup (now includes CUDA-enabled PyTorch)
docker compose -f docker/docker-compose.yml up -d
# Option 2: Use explicit CUDA setup (traditional)
docker compose -f docker/docker-compose.gpu.yml up -d
# Option 3: Use uv + GPU setup (recommended for GPU users)
docker compose -f docker/docker-compose.uv.gpu.yml up -d
# Option 4: Use CPU-only setup (may have compatibility issues)
docker compose -f docker/docker-compose.cpu.yml up -d
# Option 5: Clear model cache and retry with CUDA-enabled setup
docker volume rm chatterbox-tts-api_chatterbox-models
docker compose -f docker/docker-compose.yml up -d --build
# Option 6: Try uv for better dependency resolution
uv sync
uv run uvicorn app.main:app --host 0.0.0.0 --port 4123
For local development, install PyTorch with CUDA support:
# With pip
pip uninstall torch torchvision torchaudio
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu124
pip install chatterbox-tts
# With uv (handles this automatically)
uv sync
Windows Users, using pip & having issues:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 --force-reinstall
pip install --force-reinstall typing_extensions
Port conflicts
# Change port
echo "PORT=4124" >> .env
GPU not detected
# Force CPU mode
echo "DEVICE=cpu" >> .env
Out of memory
# Reduce chunk size
echo "MAX_CHUNK_LENGTH=200" >> .env
Model download fails
# Clear cache and retry
rm -rf models/
uvicorn app.main:app --host 0.0.0.0 --port 4123 # or: uv run main.py
FastAPI startup issues
# Check if uvicorn is installed
uvicorn --version
# Run with verbose logging
uvicorn app.main:app --host 0.0.0.0 --port 4123 --log-level debug
# Alternative startup method
python main.py
๐ป Development
Project Structure
This project follows a clean, modular architecture for maintainability:
app/ # FastAPI backend application
โโโ __init__.py # Main package
โโโ config.py # Configuration management
โโโ main.py # FastAPI application
โโโ models/ # Pydantic models
โ โโโ requests.py # Request models
โ โโโ responses.py # Response models
โโโ core/ # Core functionality
โ โโโ memory.py # Memory management
โ โโโ text_processing.py # Text processing utilities
โ โโโ tts_model.py # TTS model management
โโโ api/ # API endpoints
โโโ router.py # Main router
โโโ endpoints/ # Individual endpoint modules
โโโ speech.py # TTS endpoint
โโโ health.py # Health check
โโโ models.py # Model listing
โโโ memory.py # Memory management
โโโ config.py # Configuration
frontend/ # React frontend application
โโโ src/
โโโ Dockerfile
โโโ nginx.conf # Integrated proxy configuration
โโโ package.json
docker/ # Docker files consolidated
โโโ Dockerfile
โโโ Dockerfile.uv # uv-optimized image
โโโ Dockerfile.gpu # GPU-enabled image
โโโ Dockerfile.cpu # CPU-only image
โโโ Dockerfile.uv.gpu # uv + GPU image
โโโ docker-compose.yml # Standard deployment
โโโ docker-compose.uv.yml # uv deployment
โโโ docker-compose.gpu.yml # GPU deployment
โโโ docker-compose.uv.gpu.yml # uv + GPU deployment
โโโ docker-compose.cpu.yml # CPU-only deployment
tests/ # Test suite
โโโ test_api.py # API tests
โโโ test_memory.py # Memory tests
main.py # Main entry point
start.py # Development helper script
Quick Start Scripts
# Development mode with auto-reload
python start.py dev
# Production mode
python start.py prod
# Full Stack mode with UI (using Docker)
python start.py fullstack
# Run tests
python start.py test
# View project structure
python start.py info
Local Development
# Install in development mode (pip)
pip install -e .
# Or with uv (basic development tools)
uv sync
# Or with test dependencies (for contributors)
uv sync --group test
# Start with auto-reload (FastAPI development)
uvicorn app.main:app --host 0.0.0.0 --port 4123 --reload
# Or use the main script
python main.py
# Or use the development helper
python start.py dev
Testing
# Run API tests
python tests/test_api.py # or: uv run tests/test_api.py
# Run memory tests
python tests/test_memory.py
# Test specific endpoint
curl http://localhost:4123/health
# Check API documentation
curl http://localhost:4123/openapi.json
FastAPI Development Features
- Auto-reload: Use
--reload
flag for development - Interactive docs: Visit
/docs
for live API testing - Type hints: Full IDE support with Pydantic models
- Validation: Automatic request/response validation
- Modular structure: Easy to extend and maintain
๐ค Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Ensure FastAPI docs are updated
- Submit a pull request
Support
- ๐ Documentation: See API Documentation and Docker Guide
- ๐ Issues: Report bugs and feature requests via GitHub issues
- ๐ฌ Discord: Join the Discord for this project
๐ Integrations
Open WebUI
Customize available voices first by using the frontend at http://localhost:4321
To use Chatterbox TTS API with Open WebUI, follow these steps:
- Open the Admin Panel and go to
Settings
->Audio
- Set your TTS Settings to match the following:
- Text-to-Speech Engine: OpenAI
- API Base URL:
http://localhost:4123/v1
# alternatively, tryhttp://host.docker.internal:4123/v1
- API Key:
none
- TTS Model:
tts-1
ortts-1-hd
- TTS Voice: Name of the voice you've cloned (can also include aliases, defined in the frontend)
- Response splitting:
Paragraphs