Multilingual Support Documentation

Overview

Chatterbox TTS API supports multilingual text-to-speech generation across 22 languages using the enhanced chatterbox-tts v0.1.4 multilingual model. This feature enables high-quality voice cloning and speech synthesis in multiple languages while maintaining full OpenAI API compatibility.

Key Features

🌍 22 Languages Supported - Generate speech in Arabic, Chinese, English, French, German, Italian, Japanese, Spanish, and more
🎭 Language-Aware Voice Cloning - Upload voices with specific language assignments
🔄 Automatic Language Detection - Speech generation automatically uses the voice's assigned language
🧠 Smart Fallbacks - Graceful handling of missing languages with English fallback
📚 Voice Library Integration - Language metadata stored with each voice
⚙️ Configurable - Enable/disable multilingual mode via environment variables
🔗 OpenAI Compatible - No breaking changes to existing API endpoints
📱 Frontend Support - Language selection UI with flags and native names

Supported Languages

The multilingual model supports the following 22 languages:

Code	Language	Native Name	Flag
`ar`	Arabic	العربية	🇸🇦
`da`	Danish	Dansk	🇩🇰
`de`	German	Deutsch	🇩🇪
`el`	Greek	Ελληνικά	🇬🇷
`en`	English	English	🇺🇸
`es`	Spanish	Español	🇪🇸
`fi`	Finnish	Suomi	🇫🇮
`fr`	French	Français	🇫🇷
`he`	Hebrew	עברית	🇮🇱
`hi`	Hindi	हिन्दी	🇮🇳
`it`	Italian	Italiano	🇮🇹
`ja`	Japanese	日本語	🇯🇵
`ko`	Korean	한국어	🇰🇷
`ms`	Malay	Bahasa Melayu	🇲🇾
`nl`	Dutch	Nederlands	🇳🇱
`no`	Norwegian	Norsk	🇳🇴
`pl`	Polish	Polski	🇵🇱
`pt`	Portuguese	Português	🇵🇹
`ru`	Russian	Русский	🇷🇺
`sv`	Swedish	Svenska	🇸🇪
`sw`	Swahili	Kiswahili	🇹🇿
`tr`	Turkish	Türkçe	🇹🇷

Note: Chinese (zh) support is available in the model but currently disabled. Contact support if you need Chinese language support.

Configuration

Enable/Disable Multilingual Mode

Multilingual support is controlled by the USE_MULTILINGUAL_MODEL environment variable:

# Enable multilingual support (default)
USE_MULTILINGUAL_MODEL=true

# Disable multilingual support (English only)
USE_MULTILINGUAL_MODEL=false

Default Behavior:

Multilingual mode is enabled by default (true)
When disabled, only English is supported
Existing installations automatically get multilingual support

Environment Variables

Add to your .env file:

# Multilingual TTS Configuration
USE_MULTILINGUAL_MODEL=true   # Enable 23-language support (default: true)

API Usage

1. Get Supported Languages

Retrieve the list of languages supported by your current configuration:

curl http://localhost:4123/languages

Response (Multilingual Mode):

{
  "languages": [
    { "code": "ar", "name": "Arabic" },
    { "code": "da", "name": "Danish" },
    { "code": "de", "name": "German" }
    // ... all 23 languages
  ],
  "count": 23,
  "model_type": "multilingual"
}

Response (Standard Mode):

{
  "languages": [{ "code": "en", "name": "English" }],
  "count": 1,
  "model_type": "standard"
}

2. Upload Voice with Language

Upload a voice sample and assign a specific language:

curl -X POST http://localhost:4123/voices \
  -F "voice_name=french_speaker" \
  -F "language=fr" \
  -F "voice_file=@french_voice.wav"

Parameters:

voice_name: Unique identifier for the voice
language: ISO 639-1 language code (e.g., fr, de, ja)
voice_file: Audio file in supported format

Language Validation:

Language codes are validated against supported languages
Invalid codes return a clear error message
Defaults to "en" if not specified

3. Generate Multilingual Speech

Once a voice is uploaded with a language, speech generation automatically uses the correct language:

# Generate French speech using French voice
curl -X POST http://localhost:4123/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{
    "input": "Bonjour, comment allez-vous?",
    "voice": "french_speaker"
  }' \
  --output french_speech.wav

Key Points:

No language parameter needed in speech requests (OpenAI compatibility)
Language is automatically determined from voice metadata
Text can be in any language - the model handles cross-lingual synthesis
All standard TTS parameters work with multilingual voices

4. Voice Library with Language Metadata

List voices to see language information:

curl http://localhost:4123/voices

Response:

{
  "voices": [
    {
      "name": "french_speaker",
      "file_path": "/voices/french_speaker.wav",
      "aliases": [],
      "metadata": {
        "language": "fr",
        "created_at": "2024-01-15T10:30:00Z",
        "file_size": 2048576,
        "duration": 12.5
      }
    }
  ],
  "count": 1
}

Advanced Usage Examples

Python Examples

Upload and Use Multilingual Voice

import requests

# Upload a German voice
with open("german_speaker.wav", "rb") as voice_file:
    response = requests.post(
        "http://localhost:4123/voices",
        data={
            "voice_name": "german_narrator",
            "language": "de"
        },
        files={
            "voice_file": ("german_speaker.wav", voice_file, "audio/wav")
        }
    )

print(f"Upload status: {response.status_code}")

# Generate German speech
response = requests.post(
    "http://localhost:4123/v1/audio/speech",
    json={
        "input": "Guten Tag! Wie geht es Ihnen heute?",
        "voice": "german_narrator",
        "exaggeration": 0.8
    }
)

with open("german_output.wav", "wb") as f:
    f.write(response.content)

Batch Upload Multiple Languages

import requests
import os

voices = [
    {"file": "spanish_voice.wav", "name": "spanish_speaker", "lang": "es"},
    {"file": "italian_voice.wav", "name": "italian_speaker", "lang": "it"},
    {"file": "japanese_voice.wav", "name": "japanese_speaker", "lang": "ja"},
]

for voice in voices:
    with open(voice["file"], "rb") as f:
        response = requests.post(
            "http://localhost:4123/voices",
            data={
                "voice_name": voice["name"],
                "language": voice["lang"]
            },
            files={"voice_file": f}
        )
    print(f"Uploaded {voice['name']}: {response.status_code}")

Generate Speech in Multiple Languages

import requests

texts = [
    {"text": "Hello, how are you today?", "voice": "english_speaker"},
    {"text": "Hola, ¿cómo estás hoy?", "voice": "spanish_speaker"},
    {"text": "Ciao, come stai oggi?", "voice": "italian_speaker"},
    {"text": "こんにちは、今日はいかがですか？", "voice": "japanese_speaker"},
]

for i, item in enumerate(texts):
    response = requests.post(
        "http://localhost:4123/v1/audio/speech",
        json={
            "input": item["text"],
            "voice": item["voice"]
        }
    )

    with open(f"multilingual_output_{i+1}.wav", "wb") as f:
        f.write(response.content)

Streaming with Multilingual Voices

# Stream Japanese speech
curl -X POST http://localhost:4123/v1/audio/speech/stream \
  -H "Content-Type: application/json" \
  -d '{
    "input": "こんにちは。私の名前は田中です。よろしくお願いします。",
    "voice": "japanese_speaker",
    "chunk_strategy": "sentence"
  }' \
  --output japanese_stream.wav

Voice Upload with Custom Parameters

# Upload with additional metadata and parameters
curl -X POST http://localhost:4123/voices \
  -F "voice_name=professional_german" \
  -F "language=de" \
  -F "voice_file=@professional_voice.wav"

Frontend Integration

The web UI includes comprehensive multilingual support:

Language Selection

Dropdown with native language names and flag emojis
Automatic validation against supported languages
Default selection to English

Voice Library Display

Language badges next to each voice
Flag emojis for visual identification
Sorting and filtering by language

Upload Interface

Language selection integrated into voice upload modal
Real-time validation and feedback
Intuitive language picker with search

Technical Implementation

Architecture

The multilingual implementation consists of several key components:

Model Loading: Automatic detection and loading of multilingual vs standard TTS model
Language Detection: Voice metadata stores language information
Speech Generation: Automatic language parameter injection based on voice metadata
API Compatibility: Maintains OpenAI API format without breaking changes

Model Switching

# Automatic model selection based on configuration
if Config.USE_MULTILINGUAL_MODEL:
    model = ChatterboxMultilingualTTS(...)
    supported_languages = SUPPORTED_LANGUAGES
else:
    model = ChatterboxTTS(...)
    supported_languages = {"en": "English"}

Language Resolution

def resolve_voice_path_and_language(voice_name_or_path):
    """Resolve voice path and extract language metadata"""
    if voice_name_or_path in voice_library:
        voice_info = voice_library.get_voice_info(voice_name_or_path)
        return voice_info.path, voice_info.language
    else:
        return voice_name_or_path, "en"  # Default to English

Backward Compatibility

Existing voices: Automatically assigned English ("en") language
Existing API calls: Continue to work without modification
Configuration: Multilingual mode can be disabled for compatibility
Graceful degradation: Falls back to English for unsupported languages

Performance Considerations

Memory Usage

Multilingual model requires slightly more memory than standard model
Language switching doesn't require model reloading
Voice library scales efficiently with multiple languages

Generation Speed

Multilingual generation performance is comparable to standard model
Language-specific optimizations built into the model
Streaming maintains low latency across all languages

Storage

Voice files stored with language metadata in JSON format
No additional storage overhead for multilingual support
Efficient indexing by language for large voice libraries

Troubleshooting

Common Issues

Languages endpoint returns only English

# Check multilingual configuration
curl http://localhost:4123/config | grep USE_MULTILINGUAL_MODEL

Voice upload fails with language validation error

{
  "error": {
    "message": "Unsupported language code: xx. Supported: ar, da, de, ...",
    "type": "language_validation_error"
  }
}

Speech generation ignores voice language

Ensure voice was uploaded with correct language parameter
Check voice metadata: curl http://localhost:4123/voices
Verify multilingual mode is enabled

Debugging

Enable debug logging for multilingual operations:

# Check current configuration
curl http://localhost:4123/config

# Verify supported languages
curl http://localhost:4123/languages

# Check voice metadata
curl http://localhost:4123/voices

Migration Guide

From Standard to Multilingual

Update dependencies (already done in v0.1.4):

uv sync  # or pip install -r requirements.txt

Enable multilingual mode:

echo "USE_MULTILINGUAL_MODEL=true" >> .env

Restart the API:
```
uv run main.py  # or python main.py
```

Upload new voices with languages:

curl -X POST http://localhost:4123/voices \
  -F "voice_name=multilingual_voice" \
  -F "language=fr" \
  -F "voice_file=@voice.wav"

Existing Voice Library

Existing voices continue to work unchanged
All existing voices default to English ("en")
Optionally re-upload voices with correct language assignments
No data loss or corruption

Best Practices

Voice Quality Guidelines

Language-Specific Recordings:
- Use native speakers for each language
- Record in the target language for best results
- Avoid mixing languages within a single voice sample
Audio Quality:
- 10-30 seconds of clear speech
- Consistent speaking pace and tone
- Minimal background noise
- High-quality audio format (WAV preferred)
Voice Naming:
- Include language in voice names: french_narrator, spanish_casual
- Use descriptive names for different styles: german_formal, italian_cheerful
- Consider voice characteristics: japanese_female_young, arabic_male_deep

Multilingual Workflows

Development:
- Test with multiple languages during development
- Validate language assignment for uploaded voices
- Use streaming for better user experience with longer texts
Production:
- Monitor memory usage with multiple language models
- Implement proper error handling for unsupported languages
- Consider caching frequently used voice/language combinations
Content Management:
- Organize voices by language and use case
- Document voice characteristics and appropriate use cases
- Maintain consistent quality standards across languages

API Reference

Endpoints

Endpoint	Method	Description
`/languages`	GET	Get supported languages
`/voices`	POST	Upload voice with language
`/voices`	GET	List voices with language metadata
`/v1/audio/speech`	POST	Generate speech (language auto-detected)
`/v1/audio/speech/stream`	POST	Stream speech generation

Request/Response Models

SupportedLanguageItem

{
  "code": "fr",
  "name": "French"
}

SupportedLanguagesResponse

{
  "languages": [SupportedLanguageItem],
  "count": 23,
  "model_type": "multilingual"
}

VoiceLibraryItem

{
  "name": "french_speaker",
  "file_path": "/voices/french_speaker.wav",
  "aliases": [],
  "metadata": {
    "language": "fr",
    "created_at": "2024-01-15T10:30:00Z",
    "file_size": 2048576,
    "duration": 12.5
  }
}

Examples Repository

For more examples and integration patterns, see:

Support

📖 Documentation: Main README | API Documentation
💬 Discord: Join the community

Built with chatterbox-tts v0.1.4 • Supports 22 languages • OpenAI API Compatible