Documentation

Chatterbox API TTS header

Chatterbox TTS API

FastAPI-powered REST API for Chatterbox TTS, providing OpenAI-compatible text-to-speech endpoints with voice cloning capabilities and additional features on top of the chatterbox-tts base package.

Features

πŸš€ OpenAI-Compatible API - Drop-in replacement for OpenAI's TTS API
⚑ FastAPI Performance - High-performance async API with automatic documentation
🌍 Multilingual Support - Generate speech in 22 languages with language-aware voice cloning
🎨 React Frontend - Includes an optional, ready-to-use web interface
🎭 Voice Cloning - Use your own voice samples for personalized speech
🎀 Voice Library Management - Upload, manage, and use custom voices by name
πŸ“ Smart Text Processing - Automatic chunking for long texts
πŸ“Š Real-time Status - Monitor TTS progress, statistics, and request history
🐳 Docker Ready - Full containerization with persistent voice storage
βš™οΈ Configurable - Extensive environment variable configuration
πŸŽ›οΈ Parameter Control - Real-time adjustment of speech characteristics
πŸ“š Auto Documentation - Interactive API docs at /docs and /redoc
πŸ”§ Type Safety - Full Pydantic validation for requests and responses
🧠 Memory Management - Advanced memory monitoring and automatic cleanup

IMPORTANT

resemble-ai/chatterbox is currently broken for non-CUDA setups (see chatterbox issues)

Revert to non-multilingual by using the stable branch of this repo

View more instructions

⚑️ Quick Start

git clone https://github.com/travisvn/chatterbox-tts-api
cd chatterbox-tts-api
uv sync
uv run main.py
TIP

uv installed with curl -LsSf https://astral.sh/uv/install.sh | sh

Local Installation with Python 🐍

# Clone the repository
git clone https://github.com/travisvn/chatterbox-tts-api
cd chatterbox-tts-api

# Install uv if you haven't already
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install dependencies with uv (automatically creates venv)
uv sync

# Copy and customize environment variables
cp .env.example .env

# Start the API with FastAPI
uv run uvicorn app.main:app --host 0.0.0.0 --port 4123
# Or use the main script
uv run main.py

πŸ’‘ Why uv? Users report better compatibility with chatterbox-tts, 25-40% faster installs, and superior dependency resolution. See migration guide β†’

Option B: Using pip (Traditional)

# Clone the repository
git clone https://github.com/travisvn/chatterbox-tts-api
cd chatterbox-tts-api

# Setup environment β€” using Python 3.11
python -m venv .venv
source .venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Copy and customize environment variables
cp .env.example .env

# Add your voice sample (or use the provided one)
# cp your-voice.mp3 voice-sample.mp3

# Start the API with FastAPI
uvicorn app.main:app --host 0.0.0.0 --port 4123
# Or use the main script
python main.py

Ran into issues? Check the troubleshooting section

# Clone and start with Docker Compose
git clone https://github.com/travisvn/chatterbox-tts-api
cd chatterbox-tts-api

# Use Docker-optimized environment variables
cp .env.example.docker .env  # Docker-specific paths, ready to use
# Or: cp .env.example .env    # Local development paths, needs customization

# Choose your deployment method:

# API Only (default)
docker compose -f docker/docker-compose.yml up -d             # Standard (pip-based)
docker compose -f docker/docker-compose.uv.yml up -d          # uv-optimized (faster builds)
docker compose -f docker/docker-compose.gpu.yml up -d         # Standard + GPU
docker compose -f docker/docker-compose.uv.gpu.yml up -d      # uv + GPU (recommended for GPU users)
docker compose -f docker/docker-compose.cpu.yml up -d         # CPU-only
docker compose -f docker/docker-compose.blackwell.yml up -d   # Blackwell (50XX) NVIDIA GPUs

# API + Frontend (add --profile frontend to any of the above)
docker compose -f docker/docker-compose.yml --profile frontend up -d             # Standard + Frontend
docker compose -f docker/docker-compose.gpu.yml --profile frontend up -d         # GPU + Frontend
docker compose -f docker/docker-compose.uv.gpu.yml --profile frontend up -d      # uv + GPU + Frontend
docker compose -f docker/docker-compose.blackwell.yml --profile frontend up -d   # (Blackwell) uv + GPU + Frontend

# Watch the logs as it initializes (the first use of TTS takes the longest)
docker logs chatterbox-tts-api -f

# Test the API
curl -X POST http://localhost:4123/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{"input": "Hello from Chatterbox TTS!"}' \
  --output test.wav
πŸš€ Running with the Web UI (Full Stack)

This project includes an optional React-based web UI. Use Docker Compose profiles to easily opt in or out of the frontend:

With Docker Compose Profiles

# API only (default behavior)
docker compose -f docker/docker-compose.yml up -d

# API + Frontend + Web UI (with --profile frontend)
docker compose -f docker/docker-compose.yml --profile frontend up -d

# Or use the convenient helper script for fullstack:
python start.py fullstack

# Same pattern works with all deployment variants:
docker compose -f docker/docker-compose.gpu.yml --profile frontend up -d    # GPU + Frontend
docker compose -f docker/docker-compose.uv.yml --profile frontend up -d     # uv + Frontend
docker compose -f docker/docker-compose.cpu.yml --profile frontend up -d    # CPU + Frontend

Local Development

For local development, you can run the API and frontend separately:

# Start the API first (follow earlier instructions)
# Then run the frontend:
cd frontend && npm install && npm run dev

Click the link provided from Vite to access the web UI.

Build for Production

Build the frontend for production deployment:

cd frontend && npm install && npm run build

You can then access it directly from your local file system at /dist/index.html.

Port Configuration

  • API Only: Accessible at http://localhost:4123 (direct API access)
  • With Frontend: Web UI at http://localhost:4321, API requests routed via proxy

The frontend uses a reverse proxy to route requests, so when running with --profile frontend, the web interface will be available at http://localhost:4321 while the API runs behind the proxy.

Screenshots of Frontend (Web UI)

Chatterbox TTS API - Frontend - Dark Mode Chatterbox TTS API - Frontend - Light Mode
Chatterbox TTS API - Frontend Processing - Dark Mode Chatterbox TTS API - Frontend Processing - Light Mode

πŸ–ΌοΈ View screenshot of full frontend web UI β€” light mode / dark mode

API Usage

Basic Text-to-Speech (Default Voice)

This endpoint works for both the API-only and full-stack setups.

curl -X POST http://localhost:4123/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{"input": "Your text here"}' \
  --output speech.wav

Using Custom Parameters (JSON)

curl -X POST http://localhost:4123/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{"input": "Dramatic speech!", "exaggeration": 1.2, "cfg_weight": 0.3, "temperature": 0.9}' \
  --output dramatic.wav

Custom Voice Upload

Upload your own voice sample for personalized speech:

curl -X POST http://localhost:4123/v1/audio/speech/upload \
  -F "input=Hello with my custom voice!" \
  -F "exaggeration=0.8" \
  -F "voice_file=@my_voice.mp3" \
  --output custom_voice_speech.wav

With Custom Parameters and Voice Upload

curl -X POST http://localhost:4123/v1/audio/speech/upload \
  -F "input=Dramatic speech!" \
  -F "exaggeration=1.2" \
  -F "cfg_weight=0.3" \
  -F "temperature=0.9" \
  -F "voice_file=@dramatic_voice.wav" \
  --output dramatic.wav

Voice Library Management

Store and manage custom voices by name for reuse across requests:

# Upload a voice to the library
curl -X POST http://localhost:4123/voices \
  -F "voice_file=@my_voice.wav" \
  -F "voice_name=my-custom-voice"

# Upload a voice with language (multilingual support)
curl -X POST http://localhost:4123/voices \
  -F "voice_file=@french_voice.wav" \
  -F "voice_name=french-speaker" \
  -F "language=fr"

# Use the voice by name in speech generation
curl -X POST http://localhost:4123/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{"input": "Hello with my custom voice!", "voice": "my-custom-voice"}' \
  --output custom_voice_output.wav

# Generate French speech (language auto-detected from voice)
curl -X POST http://localhost:4123/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{"input": "Bonjour, comment allez-vous?", "voice": "french-speaker"}' \
  --output french_speech.wav

# List all available voices (includes language metadata)
curl http://localhost:4123/voices

# Get supported languages
curl http://localhost:4123/languages

πŸ”§ Complete Voice Library Documentation β†’

🌍 Multilingual Support

Generate speech in 22 languages with language-aware voice cloning and automatic language detection.

Supported Languages

Arabic (ar) β€’ Danish (da) β€’ German (de) β€’ Greek (el) β€’ English (en) β€’ Spanish (es) β€’ Finnish (fi) β€’ French (fr) β€’ Hebrew (he) β€’ Hindi (hi) β€’ Italian (it) β€’ Japanese (ja) β€’ Korean (ko) β€’ Malay (ms) β€’ Dutch (nl) β€’ Norwegian (no) β€’ Polish (pl) β€’ Portuguese (pt) β€’ Russian (ru) β€’ Swedish (sv) β€’ Swahili (sw) β€’ Turkish (tr)

Quick Start

# Get supported languages
curl http://localhost:4123/languages

# Upload voice with language
curl -X POST http://localhost:4123/voices \
  -F "voice_name=spanish_speaker" \
  -F "language=es" \
  -F "voice_file=@spanish_voice.wav"

# Generate multilingual speech
curl -X POST http://localhost:4123/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{"input": "Β‘Hola! ΒΏCΓ³mo estΓ‘s hoy?", "voice": "spanish_speaker"}' \
  --output spanish_speech.wav

Key Features

  • 🎯 Language Auto-Detection - Voices store language metadata, automatically used in generation
  • 🌐 No API Changes - Maintains OpenAI compatibility, language determined from voice metadata
  • πŸ”„ Configurable - Enable/disable with USE_MULTILINGUAL_MODEL environment variable
  • πŸ“š Voice Library Integration - Language badges and filtering in web UI
  • 🧠 Smart Fallback - Defaults to English for backward compatibility

πŸ“š Complete Multilingual Documentation β†’

🎡 Real-time Audio Streaming

The API supports multiple streaming formats for lower latency and better user experience:

  • Raw Audio Streaming: Traditional audio chunks (WAV format)
  • Server-Side Events (SSE): OpenAI-compatible format with base64-encoded audio chunks

Quick Start

# Basic audio streaming
curl -X POST http://localhost:4123/v1/audio/speech/stream \
  -H "Content-Type: application/json" \
  -d '{"input": "This streams in real-time!"}' \
  --output streaming.wav

# SSE streaming (OpenAI compatible)
curl -X POST http://localhost:4123/v1/audio/speech \
  -H "Content-Type: application/json" \
  -H "Accept: text/event-stream" \
  -d '{"input": "This streams as Server-Side Events!", "stream_format": "sse"}' \
  --no-buffer

# Real-time playback
curl -X POST http://localhost:4123/v1/audio/speech/stream \
  -H "Content-Type: application/json" \
  -d '{"input": "Play as it generates!"}' \
  | ffplay -f wav -i pipe:0 -autoexit -nodisp

πŸš€ Complete Streaming Documentation β†’

For comprehensive streaming features including:

  • Advanced chunking strategies (sentence, paragraph, word, fixed)
  • Quality presets (fast, balanced, high)
  • Configurable parameters and performance tuning
  • Real-time progress monitoring
  • Python, JavaScript, and cURL examples
  • Integration patterns for different use cases

Key Benefits:

  • ⚑ Lower latency - Start hearing audio in 1-2 seconds
  • 🎯 Better UX - No waiting for complete generation
  • πŸ’Ύ Memory efficient - Process chunks individually
  • πŸŽ›οΈ Configurable - Choose speed vs quality trade-offs
🐍 Python Examples

Default Voice (JSON)

import requests

response = requests.post(
    "http://localhost:4123/v1/audio/speech",
    json={
        "input": "Hello world!",
        "exaggeration": 0.8
    }
)

with open("output.wav", "wb") as f:
    f.write(response.content)

Upload Voice with Language (Multilingual)

import requests

# Upload a multilingual voice
with open("german_voice.wav", "rb") as voice_file:
    response = requests.post(
        "http://localhost:4123/voices",
        data={
            "voice_name": "german_speaker",
            "language": "de"
        },
        files={
            "voice_file": ("german_voice.wav", voice_file, "audio/wav")
        }
    )

print(f"Upload status: {response.status_code}")

# Generate German speech
response = requests.post(
    "http://localhost:4123/v1/audio/speech",
    json={
        "input": "Guten Tag! Wie geht es Ihnen?",
        "voice": "german_speaker",
        "exaggeration": 0.8
    }
)

with open("german_output.wav", "wb") as f:
    f.write(response.content)

Upload Endpoint (Default Voice)

import requests

response = requests.post(
    "http://localhost:4123/v1/audio/speech/upload",
    data={
        "input": "Hello world!",
        "exaggeration": 0.8
    }
)

with open("output.wav", "wb") as f:
    f.write(response.content)

Custom Voice Upload

import requests

with open("my_voice.mp3", "rb") as voice_file:
    response = requests.post(
        "http://localhost:4123/v1/audio/speech/upload",
        data={
            "input": "Hello with my custom voice!",
            "exaggeration": 0.8,
            "temperature": 1.0
        },
        files={
            "voice_file": ("my_voice.mp3", voice_file, "audio/mpeg")
        }
    )

with open("custom_output.wav", "wb") as f:
    f.write(response.content)

Basic Streaming Example

import requests

# Stream audio generation in real-time
response = requests.post(
    "http://localhost:4123/v1/audio/speech/stream",
    json={
        "input": "This will stream as it's generated!",
        "exaggeration": 0.8
    },
    stream=True  # Enable streaming mode
)

with open("streaming_output.wav", "wb") as f:
    for chunk in response.iter_content(chunk_size=8192):
        if chunk:
            f.write(chunk)
            print(f"Received chunk: {len(chunk)} bytes")

SSE Streaming Example (OpenAI Compatible)

import requests
import json
import base64

# Stream audio using Server-Side Events format
response = requests.post(
    "http://localhost:4123/v1/audio/speech",
    json={
        "input": "This streams as Server-Side Events!",
        "stream_format": "sse",
        "exaggeration": 0.8
    },
    stream=True,
    headers={'Accept': 'text/event-stream'}
)

audio_chunks = []

for line in response.iter_lines(decode_unicode=True):
    if line.startswith('data: '):
        event_data = line[6:]  # Remove 'data: ' prefix

        try:
            event = json.loads(event_data)

            if event.get('type') == 'speech.audio.delta':
                # Decode base64 audio chunk
                audio_data = base64.b64decode(event['audio'])
                audio_chunks.append(audio_data)
                print(f"Received audio chunk: {len(audio_data)} bytes")

            elif event.get('type') == 'speech.audio.done':
                usage = event.get('usage', {})
                print(f"Complete! Tokens: {usage.get('total_tokens', 0)}")
                break
        except:
            continue

print(f"Received {len(audio_chunks)} audio chunks")

πŸ“š Complete Streaming Examples & Documentation β†’

Including real-time playback, progress monitoring, custom voice uploads, and advanced integration patterns.

Voice File Requirements

Supported Formats:

  • MP3 (.mp3)
  • WAV (.wav)
  • FLAC (.flac)
  • M4A (.m4a)
  • OGG (.ogg)

Requirements:

  • Maximum file size: 10MB
  • Recommended duration: 10-30 seconds of clear speech
  • Avoid background noise for best results
  • Higher quality audio produces better voice cloning

πŸŽ›οΈ Configuration

The project provides two environment example files:

  • .env.example - For local development (uses ./models, ./voice-sample.mp3)
  • .env.example.docker - For Docker deployment (uses /cache, /app/voice-sample.mp3)

Choose the appropriate one for your setup:

# For local development
cp .env.example .env

# For Docker deployment
cp .env.example.docker .env

Key environment variables (see the example files for full list):

VariableDefaultDescription
PORT4123API server port
USE_MULTILINGUAL_MODELtrueEnable 23-language support
EXAGGERATION0.5Emotion intensity (0.25-2.0)
CFG_WEIGHT0.5Pace control (0.0-1.0)
TEMPERATURE0.8Sampling randomness (0.05-5.0)
VOICE_SAMPLE_PATH./voice-sample.mp3Voice sample for cloning
DEVICEautoDevice (auto/cuda/mps/cpu)
🎭 Voice Cloning

Replace the default voice sample:

# Replace the default voice sample
cp your-voice.mp3 voice-sample.mp3

# Or set a custom path
echo "VOICE_SAMPLE_PATH=/path/to/your/voice.mp3" >> .env

For best results:

  • Use 10-30 seconds of clear speech
  • Avoid background noise
  • Prefer WAV or high-quality MP3
🐳 Docker Deployment

Development

docker compose -f docker/docker-compose.yml up

Production

# Create production environment
cp .env.example.docker .env
nano .env  # Set production values

# Deploy
docker compose -f docker/docker-compose.yml up -d

With GPU Support

# Use GPU-enabled compose file
# Ensure NVIDIA Container Toolkit is installed
docker compose -f docker/docker-compose.gpu.yml up -d
πŸ“š API Reference

API Endpoints

EndpointMethodDescription
/audio/speechPOSTGenerate speech from text (complete)
/audio/speech/uploadPOSTGenerate speech with voice upload
/audio/speech/streamPOSTStream speech generation (docs)
/audio/speech/stream/uploadPOSTStream speech with voice upload (docs)
/voicesGETList voices in library (with language metadata)
/voicesPOSTUpload voice to library (with language support)
/languagesGETGet supported languages (docs)
/healthGETHealth check and status
/configGETCurrent configuration
/v1/modelsGETAvailable models (OpenAI compat)
/statusGETTTS processing status & progress
/status/progressGETReal-time progress (lightweight)
/status/statisticsGETProcessing statistics
/status/historyGETRecent request history
/infoGETComplete API information
/docsGETInteractive API documentation
/redocGETAlternative API documentation

Parameters Reference

Speech Generation Parameters

Exaggeration (0.25-2.0)

  • 0.3-0.4: Professional, neutral
  • 0.5: Default balanced
  • 0.7-0.8: More expressive
  • 1.0+: Very dramatic

CFG Weight (0.0-1.0)

  • 0.2-0.3: Faster speech
  • 0.5: Default pace
  • 0.7-0.8: Slower, deliberate

Temperature (0.05-5.0)

  • 0.4-0.6: More consistent
  • 0.8: Default balance
  • 1.0+: More creative/random

Stream Format

  • audio: Raw audio streaming (default)
  • sse: Server-Side Events with base64-encoded audio chunks (OpenAI compatible)
🧠 Memory Management

The API includes advanced memory management to prevent memory leaks and optimize performance:

Memory Management Features

  • Automatic Cleanup: Periodic garbage collection and tensor cleanup
  • CUDA Memory Management: Automatic GPU cache clearing
  • Memory Monitoring: Real-time memory usage tracking
  • Manual Controls: API endpoints for manual cleanup operations

Memory Configuration

VariableDefaultDescription
MEMORY_CLEANUP_INTERVAL5Cleanup memory every N requests
CUDA_CACHE_CLEAR_INTERVAL3Clear CUDA cache every N requests
ENABLE_MEMORY_MONITORINGtrueEnable detailed memory logging

Memory Monitoring Endpoints

# Get memory status
curl http://localhost:4123/memory

# Trigger manual cleanup
curl "http://localhost:4123/memory?cleanup=true&force_cuda_clear=true"

# Reset memory tracking (with confirmation)
curl -X POST "http://localhost:4123/memory/reset?confirm=true"

Real-time Status Tracking

Monitor TTS processing in real-time:

# Check current processing status
curl "http://localhost:4123/v1/status/progress"

# Get detailed status with memory and stats
curl "http://localhost:4123/v1/status?include_memory=true&include_stats=true"

# View processing statistics
curl "http://localhost:4123/v1/status/statistics"

# Check request history
curl "http://localhost:4123/v1/status/history?limit=5"

# Get comprehensive API information
curl "http://localhost:4123/info"

Status Response Example:

{
  "is_processing": true,
  "status": "generating_audio",
  "current_step": "Generating audio for chunk 2/4",
  "current_chunk": 2,
  "total_chunks": 4,
  "progress_percentage": 50.0,
  "duration_seconds": 2.5,
  "text_preview": "Your text being processed..."
}

See Status API Documentation for complete details.

Memory Testing

Run the memory management test suite:

# Test memory patterns and cleanup
python tests/test_memory.py  # or: uv run tests/test_memory.py

# Monitor memory during testing
watch -n 1 'curl -s http://localhost:4123/memory | jq .memory_info'

Memory Optimization Tips

For High-Volume Production:

MEMORY_CLEANUP_INTERVAL=3
CUDA_CACHE_CLEAR_INTERVAL=2
ENABLE_MEMORY_MONITORING=false  # Reduce logging overhead
MAX_CHUNK_LENGTH=200             # Smaller chunks for less memory usage

For Development/Debugging:

MEMORY_CLEANUP_INTERVAL=1
CUDA_CACHE_CLEAR_INTERVAL=1
ENABLE_MEMORY_MONITORING=true

Memory Leak Prevention:

  • Tensors are automatically moved to CPU before deletion
  • Gradient tracking is disabled during inference
  • Audio chunks are cleaned up after concatenation
  • CUDA cache is periodically cleared
  • Python garbage collection is triggered regularly
πŸ§ͺ Testing

Run the test script to verify the API functionality:

python tests/test_api.py

The test script will:

  • Test health check endpoint
  • Test models endpoint
  • Test API documentation endpoints (new!)
  • Generate speech for various text lengths
  • Test custom parameter validation
  • Test error handling with validation
  • Save generated audio files as test_output_*.wav
⚑ Performance

FastAPI Benefits:

  • Async support: Better concurrent request handling
  • Faster serialization: JSON responses ~25% faster than Flask
  • Type validation: Pydantic models prevent invalid requests
  • Auto documentation: No manual API doc maintenance

Hardware Recommendations:

  • CPU: Works but slower, reduce chunk size for better memory usage
  • GPU: Recommended for production, significantly faster
  • Memory: 4GB minimum, 8GB+ recommended
  • Concurrency: Async support allows better multi-request handling
πŸ”§ Troubleshooting

Common Issues

CUDA/CPU Compatibility Error

RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False

This happens because chatterbox-tts models require PyTorch with CUDA support, even when running on CPU. Solutions:

# Option 1: Use default setup (now includes CUDA-enabled PyTorch)
docker compose -f docker/docker-compose.yml up -d

# Option 2: Use explicit CUDA setup (traditional)
docker compose -f docker/docker-compose.gpu.yml up -d

# Option 3: Use uv + GPU setup (recommended for GPU users)
docker compose -f docker/docker-compose.uv.gpu.yml up -d

# Option 4: Use CPU-only setup (may have compatibility issues)
docker compose -f docker/docker-compose.cpu.yml up -d

# Option 5: Clear model cache and retry with CUDA-enabled setup
docker volume rm chatterbox-tts-api_chatterbox-models
docker compose -f docker/docker-compose.yml up -d --build

# Option 6: Try uv for better dependency resolution
uv sync
uv run uvicorn app.main:app --host 0.0.0.0 --port 4123

For local development, install PyTorch with CUDA support:

# With pip
pip uninstall torch torchvision torchaudio
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu124
pip install git+https://github.com/travisvn/chatterbox-multilingual.git@exp

# With uv (handles this automatically)
uv sync

Windows Users, using pip & having issues:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 --force-reinstall
pip install --force-reinstall typing_extensions

Port conflicts

# Change port
echo "PORT=4124" >> .env

GPU not detected

# Force CPU mode
echo "DEVICE=cpu" >> .env

Out of memory

# Reduce chunk size
echo "MAX_CHUNK_LENGTH=200" >> .env

Model download fails

# Clear cache and retry
rm -rf models/
uvicorn app.main:app --host 0.0.0.0 --port 4123  # or: uv run main.py

FastAPI startup issues

# Check if uvicorn is installed
uvicorn --version

# Run with verbose logging
uvicorn app.main:app --host 0.0.0.0 --port 4123 --log-level debug

# Alternative startup method
python main.py
πŸ’» Development

Project Structure

This project follows a clean, modular architecture for maintainability:

app/                     # FastAPI backend application
β”œβ”€β”€ __init__.py           # Main package
β”œβ”€β”€ config.py            # Configuration management
β”œβ”€β”€ main.py              # FastAPI application
β”œβ”€β”€ models/              # Pydantic models
β”‚   β”œβ”€β”€ requests.py      # Request models
β”‚   └── responses.py     # Response models
β”œβ”€β”€ core/                # Core functionality
β”‚   β”œβ”€β”€ memory.py        # Memory management
β”‚   β”œβ”€β”€ text_processing.py # Text processing utilities
β”‚   └── tts_model.py     # TTS model management
└── api/                 # API endpoints
    β”œβ”€β”€ router.py        # Main router
    └── endpoints/       # Individual endpoint modules
        β”œβ”€β”€ speech.py    # TTS endpoint
        β”œβ”€β”€ health.py    # Health check
        β”œβ”€β”€ models.py    # Model listing
        β”œβ”€β”€ memory.py    # Memory management
        └── config.py    # Configuration

frontend/                # React frontend application
β”œβ”€β”€ src/
β”œβ”€β”€ Dockerfile
β”œβ”€β”€ nginx.conf          # Integrated proxy configuration
└── package.json

docker/                  # Docker files consolidated
β”œβ”€β”€ Dockerfile
β”œβ”€β”€ Dockerfile.uv       # uv-optimized image
β”œβ”€β”€ Dockerfile.gpu      # GPU-enabled image
β”œβ”€β”€ Dockerfile.cpu      # CPU-only image
β”œβ”€β”€ Dockerfile.uv.gpu   # uv + GPU image
β”œβ”€β”€ docker-compose.yml  # Standard deployment
β”œβ”€β”€ docker-compose.uv.yml # uv deployment
β”œβ”€β”€ docker-compose.gpu.yml # GPU deployment
β”œβ”€β”€ docker-compose.uv.gpu.yml # uv + GPU deployment
└── docker-compose.cpu.yml # CPU-only deployment

tests/                   # Test suite
β”œβ”€β”€ test_api.py         # API tests
└── test_memory.py      # Memory tests

main.py                  # Main entry point
start.py                 # Development helper script

Quick Start Scripts

# Development mode with auto-reload
python start.py dev

# Production mode
python start.py prod

# Full Stack mode with UI (using Docker)
python start.py fullstack

# Run tests
python start.py test

# View project structure
python start.py info

Local Development

# Install in development mode (pip)
pip install -e .

# Or with uv (basic development tools)
uv sync

# Or with test dependencies (for contributors)
uv sync --group test

# Start with auto-reload (FastAPI development)
uvicorn app.main:app --host 0.0.0.0 --port 4123 --reload

# Or use the main script
python main.py

# Or use the development helper
python start.py dev

Testing

# Run API tests
python tests/test_api.py  # or: uv run tests/test_api.py

# Run memory tests
python tests/test_memory.py

# Test specific endpoint
curl http://localhost:4123/health

# Check API documentation
curl http://localhost:4123/openapi.json

FastAPI Development Features

  • Auto-reload: Use --reload flag for development
  • Interactive docs: Visit /docs for live API testing
  • Type hints: Full IDE support with Pydantic models
  • Validation: Automatic request/response validation
  • Modular structure: Easy to extend and maintain
🀝 Contributing
  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Ensure FastAPI docs are updated
  6. Submit a pull request

Issues with Multilingual?

Fallback to the LKG (last known good) for the pre-multilingual release

git clone --branch stable https://github.com/travisvn/chatterbox-tts-api

View stable branch to see proper install / troubleshooting documentation

Support


πŸ”— Integrations

Open WebUI

TIP

Customize available voices first by using the frontend at http://localhost:4321

To use Chatterbox TTS API with Open WebUI, follow these steps:

  • Open the Admin Panel and go to Settings -> Audio
  • Set your TTS Settings to match the following:
    • Text-to-Speech Engine: OpenAI
    • API Base URL: http://localhost:4123/v1 # alternatively, try http://host.docker.internal:4123/v1
    • API Key: none
    • TTS Model: tts-1 or tts-1-hd
    • TTS Voice: Name of the voice you've cloned (can also include aliases, defined in the frontend)
    • Response splitting: Paragraphs
Settings to integrate Chatterbox TTS API with Open WebUI

➑️ View the Open WebUI docs for installing Chatterbox TTS API