Changelog
All notable changes to the Chatterbox TTS API project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[2.1.0] - 2025-09-15
Feature: Long Text Synthesis
This release introduces the capability to synthesize long-form text by splitting it into manageable chunks, processing them in the background, and concatenating the final audio.
The frontend now allows for long text inputs and has been revamped completely to be able to handle the status of the processing, to keep a history of previous long-text speech generations (and their configurations), etc.
Added
Long Text Job Processing
- Asynchronous Job Handling: Implemented a background processing system to handle long text synthesis jobs without blocking the API.
- New API Endpoints: Introduced a new set of endpoints for creating long text jobs, querying their status, and performing bulk actions.
- Audio Concatenation: Integrated
pydub
to seamlessly concatenate generated audio chunks into a single, coherent audio file.
Frontend Integration
- Job Management UI: Added a new section in the frontend to submit text for long synthesis, monitor job progress, and download the final audio.
- Real-time Status Updates: The UI now displays the real-time status and progress of ongoing long text jobs.
Configuration & Dependencies
- New Configuration: Added environment variables to
.env.example
and.env.example.docker
for configuring long text TTS behavior. - New Dependencies: Added
pydub
for audio manipulation andsse-starlette
to support job processing.
Enhanced
Input Handling
- Validation and Error Handling: Improved validation for long text inputs to ensure job stability and provide clearer error feedback to the user.
[2.0.0] — 2025-09-10
Major Release: Multilingual Support
This major version introduces comprehensive multilingual text-to-speech capabilities, supporting 22 languages with enhanced voice cloning and automatic language detection. This release represents a significant architectural upgrade to the TTS engine.
🌍 Added
Multilingual TTS Engine
- 22 Language Support: Complete multilingual text-to-speech generation using the enhanced
chatterbox-tts
v0.1.4 multilingual model - Supported Languages: Arabic, Danish, German, Greek, English, Spanish, Finnish, French, Hebrew, Hindi, Italian, Japanese, Korean, Malay, Dutch, Norwegian, Polish, Portuguese, Russian, Swedish, Swahili, Turkish
- Language-Aware Voice Cloning: Upload and manage voice samples with specific language assignments
- Automatic Language Detection: Speech generation automatically uses the voice's assigned language without requiring language parameters in requests
New API Endpoints
GET /languages
: Retrieve list of all supported languages with codes and names- Enhanced Voice Upload: Voice upload endpoint now accepts optional
language
parameter with validation against supported languages - Language Metadata: Voice library responses now include language information in metadata
Configuration Management
USE_MULTILINGUAL_MODEL
: New environment variable to enable/disable multilingual support (default:true
)- Automatic Model Selection: System automatically selects multilingual or standard model based on configuration
- Graceful Fallbacks: Falls back to English for unsupported languages or when multilingual mode is disabled
Frontend Multilingual Integration
- Language Selection UI: Complete language picker with native names and flag emojis
- Voice Library Language Display: Language badges and indicators for each voice in the library
- Upload Modal Enhancement: Integrated language selector in voice upload interface with real-time validation
- Visual Language Indicators: Flag emojis and language codes throughout the user interface
Developer Experience
- Comprehensive Documentation: New
docs/MULTILINGUAL.md
with complete API reference and usage examples - Testing Guide:
docs/MULTILINGUAL_TESTING_GUIDE.md
with step-by-step testing instructions - Language Constants: Frontend language utilities and constants for consistent UI display
- API Examples: Multilingual usage examples in Python, JavaScript, and cURL
🔄 Enhanced
Backend Architecture
- Speech Generation Pipeline: Updated all speech endpoints to use
resolve_voice_path_and_language()
for automatic language detection - Streaming Support: All streaming endpoints now support multilingual voices with language-aware processing
- Voice Library System: Enhanced voice metadata storage to include language information with backward compatibility
- Memory Management: Optimized memory handling for multilingual model operations
API Improvements
- Request Validation: Enhanced voice upload validation with language code verification
- Response Models: Updated voice library responses to include language metadata
- Error Handling: Improved error messages for language validation and unsupported language scenarios
- OpenAI Compatibility: Maintained full OpenAI API compatibility while adding multilingual capabilities
Frontend Enhancements
- Language Utilities: New frontend utilities for language display, validation, and flag emoji mapping
- State Management: Enhanced voice library state management to handle language metadata
- User Experience: Improved upload workflow with language selection and validation feedback
- Responsive Design: Language selector components optimized for mobile and desktop interfaces
🔧 Technical Implementation
Core Components Added
app/core/mtl.py
: Multilingual support module with language constants and utilities- Language Resolution:
resolve_voice_path_and_language()
function for automatic language detection - Model Loading: Automatic multilingual vs standard model selection based on configuration
- Language Validation: Comprehensive validation against supported language codes
Frontend Components Added
- Language Selection Components: Dropdown selectors with native names and flag emojis
- Language Display Utilities: Consistent language representation across the interface
- Voice Upload Enhancements: Integrated language selection in upload modal
- Language Constants: Complete mapping of language codes to names and flags
Database & Storage
- Voice Metadata Enhancement: Extended voice storage to include language information
- Backward Compatibility: Existing voices automatically assigned English (
"en"
) language - Migration Support: Seamless upgrade path for existing voice libraries
🚀 Breaking Changes
Dependency Updates
- chatterbox-tts Version: Upgraded from
1.0.4
to0.1.4
(breaking change requiring dependency update) - Model Architecture: New multilingual model may require initial download and setup
- Python Compatibility: Enhanced compatibility testing for Python 3.11+ environments
📚 Documentation
New Documentation
docs/MULTILINGUAL.md
: Comprehensive multilingual feature documentation with API referencedocs/MULTILINGUAL_TESTING_GUIDE.md
: Step-by-step testing guide for multilingual functionality- README Updates: Enhanced main documentation with multilingual usage examples and configuration guide
API Documentation
- Language Endpoints: Complete API reference for new language-related endpoints
- Usage Examples: Multilingual examples in multiple programming languages
- Migration Guide: Instructions for upgrading existing installations to multilingual support
🔄 Migration Notes
For Existing Users
- Automatic Upgrade: Multilingual support is enabled by default for new installations
- Existing Voices: All existing voices continue to work unchanged (default to English)
- Configuration: Set
USE_MULTILINGUAL_MODEL=false
to disable multilingual support if needed - Dependencies: Run
uv sync
orpip install -r requirements.txt
to update to new chatterbox-tts version
Compatibility
- API Compatibility: All existing API endpoints remain unchanged and fully compatible
- Voice Library: Existing voice libraries are automatically migrated with language metadata
- Frontend: Existing frontend installations receive multilingual UI enhancements automatically
🎯 Performance
Model Optimization
- Memory Management: Optimized memory usage for multilingual model operations
- Language Switching: Efficient language switching without model reloading
- Streaming Performance: Maintained low-latency streaming across all supported languages
Quality Improvements
- Voice Quality: Enhanced voice cloning quality with language-specific optimizations
- Speech Synthesis: Improved speech generation accuracy for non-English languages
- Cross-lingual Support: Better handling of mixed-language content
[1.6.1] - 2025-09-06
Fixed
- Dependency Pinning for
chatterbox-tts
: Pinned thechatterbox-tts
dependency to a specific version across all installation methods (pip
,uv pip
,requirements.txt
, Dockerfiles). This ensures continued functionality and prevents breakage from recent upstream changes in the Resemble AI chatterbox library.
Added
- Experimental Blackwell GPU Support: Introduced a new, experimental Docker configuration for users with NVIDIA Blackwell GPUs. While functionality is not guaranteed, it provides a template for running the service on this new architecture, both within Docker and as a guide for local setups.
[1.6.0] - 2025-07-02
Added
- SSE Streaming: Introduction of Server-Side Events (SSE) streaming functionality for real-time audio generation that more closely aligns with the OpenAI Speech API endpoint
- Frontend Streaming: New frontend functionality to opt to stream the audio (while also being able to download the full audio at the end of the stream)
Updated
- Python tests for the SSE events and for the frontend streaming
[1.5.0] - 2025-06-23
Added
- Memory Management Page: New frontend page for monitoring and managing backend memory usage
- Memory Management API Endpoints: New endpoints for memory info, cleanup, reset, and recommendations
- Navigation System: Added Wouter-based routing between TTS and Memory Management pages
- Real-time Memory Monitoring: Live tracking of CPU and GPU memory usage with trend charts
- Memory Alerts & Recommendations: Intelligent alerts and optimization suggestions
Enhanced
- Frontend Polish: Improved color schemes and visual consistency across components
- Memory Cleanup: More robust memory management with configurable cleanup intervals
- User Experience: Better responsive design and error handling
[1.4.1] - 2025-06-22
Added
- Voice Alias System: Add multiple aliases to any voice with persistent storage and UI management
- OpenAI Voice Mapping: Map OpenAI voice names (alloy, echo, fable, etc.) to custom voices via aliases
- Alias API Endpoints: New endpoints for adding, removing, and listing voice aliases
- Alias UI Management: Visual alias badges with inline editing and one-click removal
Fixed
- Default Voice Persistence: Fixed default voice settings not persisting across frontend and backend sessions
- Voice Resolution: Improved voice name resolution to handle both direct names and aliases
- Voice Library State: Better synchronization between frontend and backend voice library state
- Error Handling: Enhanced error handling for voice operations and alias conflicts
[1.4.0] - 2025-06-18
Added
Voice Library Management System
- Comprehensive Voice CRUD Operations:
- Voice upload with automatic validation and storage
- Voice retrieval with metadata and file serving
- Voice deletion with cleanup and validation
- Voice renaming with conflict detection
- List all voices with detailed metadata
- Voice Storage Infrastructure:
- Persistent voice storage with configurable directories
- Environment-based configuration for voice storage paths
- Automatic voice file organization and cleanup
- Safe file handling with validation and sanitization
New API Endpoints
- Voice Management Endpoints:
POST /voices/upload
- Upload custom voice samples with metadataGET /voices
- List all available voices with detailsGET /voices/{voice_name}
- Retrieve specific voice informationGET /voices/{voice_name}/file
- Download voice fileDELETE /voices/{voice_name}
- Delete voice with confirmationPUT /voices/{voice_name}/rename
- Rename voice with validation
- Enhanced Health Checks:
- Improved health endpoint with voice storage validation
- System status checks including voice directory health
- Enhanced API information with voice library status
Default Voice Management
- Default Voice System:
- Configurable default voice selection and management
- Automatic fallback to system default when custom default unavailable
- Default voice validation and health monitoring
- Environment-based default voice configuration
- Voice Configuration:
- Persistent default voice settings across sessions
- Voice availability validation and error handling
- Seamless integration with existing TTS endpoints
Frontend Voice Library Integration
- Voice Library Components:
- Interactive voice library management interface
- Voice upload modal with validation and feedback
- Voice selection dropdown with default voice indicator
- Voice deletion confirmation with safety checks
- Voice Library Management:
- Real-time voice library updates and synchronization
- Voice metadata display including file size and type
- Drag-and-drop voice upload support
- Voice preview and testing capabilities
- Default Voice Settings:
- Default voice selector with persistent settings
- Visual indicators for default voice status
- Automatic UI updates when default voice changes
- Voice availability status in the interface
Enhanced
Backend Improvements
- Voice Storage Architecture:
- Robust file system integration with proper error handling
- Thread-safe voice operations with proper locking
- Enhanced validation for voice file formats and quality
- Automatic cleanup of orphaned voice files
- Configuration Management:
- Environment variable support for voice storage configuration
- Configurable voice directory paths and permissions
- Enhanced configuration validation and error reporting
- Better default value management for voice settings
API Architecture
- Enhanced Request Validation:
- Comprehensive voice parameter validation
- File upload validation with size and format checks
- Voice name sanitization and conflict detection
- Improved error messages and status codes
- Response Models:
- Structured voice metadata response models
- Enhanced voice list responses with detailed information
- Better error response formatting for voice operations
- Consistent API response patterns across voice endpoints
Frontend Enhancements
- User Experience Improvements:
- Intuitive voice management workflows
- Real-time feedback for voice operations
- Enhanced loading states and progress indicators
- Better error handling and user notifications
- State Management:
- Centralized voice library state management
- Optimistic updates for voice operations
- Real-time synchronization with backend voice changes
- Persistent voice preferences and settings
Technical Details
New Functions Added
upload_voice()
- Voice upload handler with validationget_voices()
- Voice listing with metadata retrievalget_voice_file()
- Voice file serving and downloaddelete_voice()
- Safe voice deletion with cleanuprename_voice()
- Voice renaming with conflict checkingset_default_voice()
- Default voice managementvalidate_voice_file()
- Voice file validation and processing
Configuration Enhancements
- Voice storage directory configuration via environment variables
- Default voice management through configuration system
- Enhanced health checks with voice storage validation
- Voice-related environment variable documentation
Frontend Components Added
VoiceLibrary
- Main voice library management componentVoiceUploadModal
- Voice upload interface with validationuseVoiceLibrary
- Custom hook for voice library operationsuseDefaultVoice
- Custom hook for default voice management- Voice-related utilities and helpers for file handling
Infrastructure
- Environment Configuration: Enhanced environment variable support for voice storage
- Documentation: Updated API documentation with voice library endpoints
- Testing: Extended test coverage for voice management operations
- Validation: Comprehensive voice file and parameter validation
[1.3.0] - 2025-06-15
Added
Real-time Audio Streaming
- New Streaming Endpoints:
POST /audio/speech/stream
- Real-time streaming with configured voice samplePOST /audio/speech/stream/upload
- Real-time streaming with custom voice upload- Chunked transfer encoding for true streaming experience
- WAV header optimization for streaming compatibility
Advanced Streaming Features
- Streaming Strategies:
sentence
(default) - Splits at sentence boundaries for natural flowparagraph
- Splits at paragraph breaks for longer contextsfixed
- Fixed character count chunks for predictable timingword
- Word-boundary splitting for maximum responsiveness
- Quality Presets:
fast
mode - Optimized for low latency with smaller chunksbalanced
mode (default) - Good balance of speed and qualityhigh
mode - Larger chunks for better audio quality
- Configurable Parameters:
streaming_chunk_size
(50-500) - Characters per streaming chunkstreaming_buffer_size
(1-10) - Number of chunks to bufferstreaming_quality
- Quality preset selection
Enhanced Text Processing
- Streaming-Optimized Text Splitting:
split_text_for_streaming()
function with strategy-based chunkingget_streaming_settings()
for optimized parameter management- Enhanced sentence, paragraph, and word boundary detection
- Smart long-sentence splitting with natural break points
- Memory-Efficient Processing:
- Chunk-by-chunk processing to reduce memory footprint
- Automatic cleanup of processed audio chunks
- Progressive WAV streaming without full concatenation
Progress Monitoring Integration
- Real-time Progress Tracking:
- Enhanced status system with streaming-specific progress updates
- Chunk-by-chunk progress reporting with percentage completion
- Strategy-aware progress descriptions (e.g., "Streaming audio for chunk 3/8 (sentence strategy)")
- Memory usage tracking during streaming operations
Documentation & Examples
- Comprehensive Streaming Documentation:
- New
docs/STREAMING_API.md
with detailed streaming guide - Performance comparison between streaming and standard generation
- Advanced usage examples in Python, JavaScript, and cURL
- Real-time playback examples with pyaudio and Web Audio API
- New
- Updated Main Documentation:
- Streamlined streaming section in main README
- Clear navigation to detailed streaming documentation
- Updated API endpoints table with streaming links
Enhanced
Backend Improvements
- Streaming Implementation:
generate_speech_streaming()
async generator for true streaming- WAV header handling for streaming compatibility
- Memory management optimized for streaming workflows
- Automatic cleanup of temporary voice files during streaming
- Error Handling:
- Streaming-specific error handling and status updates
- Graceful fallback for streaming failures
- Enhanced validation for streaming parameters
Performance Optimizations
- Memory Management:
- Reduced memory usage through progressive processing
- Automatic tensor cleanup during streaming
- CUDA memory optimization for streaming workflows
- Latency Improvements:
- First audio chunk available in 1-2 seconds
- Progressive audio delivery eliminates wait times
- Optimized chunking strategies for different use cases
Technical Details
New Functions Added
generate_speech_streaming()
- Core streaming generator functionsplit_text_for_streaming()
- Strategy-based text splitting for streamingget_streaming_settings()
- Optimized parameter management_split_by_paragraphs()
,_split_by_sentences()
,_split_by_words()
,_split_by_fixed_size()
- Strategy implementations
API Enhancements
- Streaming Request Models:
- Extended
TTSRequest
model with streaming parameters - Validation for streaming strategies and quality presets
- Form-data support for streaming with voice upload
- Extended
- Response Handling:
- Chunked transfer encoding headers
- Streaming-optimized content headers
- Progressive WAV format streaming
Frontend Integration
- Updated Examples:
- Real-time streaming examples in multiple languages
- Progress monitoring integration examples
- Performance optimization guidelines
- API Reference:
- Complete streaming parameter documentation
- Performance comparison tables
- Troubleshooting guide for streaming issues
Infrastructure
- Testing: Extended test coverage for streaming endpoints
- Documentation: Comprehensive streaming API documentation
- Examples: Production-ready streaming integration examples
[1.2.0] - 2025-06-14
Added
- Version management system with automatic version reading from
pyproject.toml
- Frontend and backend version display in the UI
- Comprehensive changelog documentation
Status API & Monitoring
- Real-time TTS Processing Status API with comprehensive endpoints:
/status
- Main status endpoint with configurable details/status/progress
- Lightweight progress monitoring/status/statistics
- Processing metrics and performance stats/status/history
- Request history with detailed records/info
- Comprehensive API information endpoint
- Advanced Status Tracking System:
- Thread-safe status manager with request lifecycle tracking
- Progress percentage calculation with chunk-by-chunk monitoring
- Processing statistics including success rates and performance metrics
- Request history with detailed metadata and text previews
- Memory usage tracking integration
- Enhanced API Information:
- Version management with automatic
pyproject.toml
integration - Comprehensive API metadata including author, license, and platform info
- Real-time server status and configuration details
- Version management with automatic
Frontend Enhancements
- Modern UI Framework Integration:
- Full shadcn/ui component library integration
- Responsive design with mobile-first approach
- Improved color system with dark/light mode support
- Enhanced accessibility with proper ARIA labels and focus management
- Advanced Status Monitoring Dashboard:
- Real-time status header with live processing updates
- Interactive statistics panel with processing metrics
- Progress overlay with visual chunk progression
- Request history viewer with detailed request information
- Enhanced User Experience:
- Responsive dialog-drawer modal system for mobile optimization
- Improved form controls with better validation feedback
- Advanced settings panel with real-time parameter adjustment
- Voice library management with upload, rename, and delete capabilities
- State Management Improvements:
- Centralized API state management with React Query
- Real-time data synchronization across components
- Optimistic updates for better user experience
- Comprehensive error handling and retry mechanisms
Enhanced
Backend Improvements
- Memory Management System:
- Advanced memory monitoring with detailed usage tracking
- Automatic cleanup routines for CUDA and CPU memory
- Memory optimization for long-running processes
- Configurable cleanup intervals and monitoring settings
- API Architecture:
- Comprehensive endpoint aliasing system for backward compatibility
- Enhanced request/response validation with Pydantic models
- Improved error handling and status reporting
- Better FastAPI documentation with detailed endpoint descriptions
Developer Experience
- Docker Infrastructure:
- Multiple Docker Compose configurations for different deployment scenarios
- GPU-optimized containers with CUDA support
- uv-based builds for faster dependency resolution
- Production-ready container configurations
- Configuration Management:
- Environment variable validation and type checking
- Comprehensive configuration endpoints for debugging
- Better default values and configuration documentation
- Runtime configuration updates for development
Technical Details
API Endpoints Added
GET /status
- Main status with optional memory, history, and statsGET /status/progress
- Lightweight progress monitoringGET /status/statistics
- Processing metrics and performance dataGET /status/history
- Request history with configurable limitsPOST /status/history/clear
- Clear request history with confirmationGET /info
- Comprehensive API information and version details
Frontend Components Added
StatusHeader
- Real-time status display with API versionStatusProgressOverlay
- Visual progress tracking during generationStatusStatisticsPanel
- Interactive statistics dashboarduseStatusMonitoring
- Custom hook for real-time status updates- Version management utilities for frontend/backend version tracking
Configuration Enhancements
- Version management with
app/core/version.py
- Enhanced
/config
endpoint with API metadata - Frontend version utilities with build-time integration
- Comprehensive version display in UI components
Infrastructure
- Versioning System: Automatic version reading from
pyproject.toml
with fallback support - Build Optimization: uv integration for faster builds and better dependency resolution
- Documentation: Enhanced API documentation with detailed endpoint descriptions
- Testing: Extended test suite for status endpoints and memory management
[1.0.0] - 2025-06-XX
Added
- Initial release of Chatterbox TTS API
- FastAPI-based REST API with OpenAI compatibility
- Voice cloning capabilities with custom voice samples
- Docker containerization support
- React-based web frontend
- Basic health monitoring and configuration endpoints
- Memory management system
- Text processing with automatic chunking
- Multi-format audio support (MP3, WAV, FLAC, M4A, OGG)
Core Features
- OpenAI-compatible
/v1/audio/speech
endpoint - Voice file upload support via
/v1/audio/speech/upload
- Health check endpoint
/health
- Models listing endpoint
/v1/models
- Configuration endpoint
/config
- Memory management endpoint
/memory
Frontend Features
- Text-to-speech interface with parameter controls
- Voice library management
- Audio history and playback
- Advanced settings panel
- Theme support (light/dark mode)
- Responsive design
Version Format
This project uses semantic versioning (MAJOR.MINOR.PATCH):
- MAJOR: Incompatible API changes
- MINOR: New functionality in a backwards compatible manner
- PATCH: Backwards compatible bug fixes
Backend vs Frontend Versioning
- Backend API Version: Follows the main project version (currently 1.2.0)
- Frontend Version: Independent versioning for UI updates (currently 1.1.0)
- Display: Backend version shown in status header, frontend version in footer