Open Source · Apache-2.0 · 3.5K Stars
Voice Cloning with LuxTTS
Clone any voice from a 3-second sample. 150x realtime speed, 48kHz crystal-clear output, under 1GB VRAM. The fastest open-source TTS model available.
150x
Realtime Speed
48kHz
Audio Quality
<1GB
VRAM Required
3sec
Min Reference
Quick Start
# Clone and install
git clone https://github.com/ysharma3501/LuxTTS.git
cd LuxTTS
pip install -r requirements.txt
# Load model (choose your device)
from zipvoice.luxvoice import LuxTTS
lux_tts = LuxTTS('YatharthS/LuxTTS', device='cuda')
# or device='cpu' | device='mps' (Mac)
Generate Speech in 4 Lines
import soundfile as sf
# Encode reference voice (min 3 seconds)
encoded = lux_tts.encode_prompt('reference.wav', rms=0.01)
# Generate speech
wav = lux_tts.generate_speech("Hello world!", encoded, num_steps=4)
# Save (48kHz)
sf.write('output.wav', wav.numpy().squeeze(), 48000)
Tuning Parameters
| Parameter | Default | Description |
|---|---|---|
| rms | 0.01 | Volume level. Higher = louder. 0.01 recommended. |
| t_shift | 0.9 | Sampling quality. Higher = better sound, more pronunciation errors. |
| num_steps | 4 | Quality steps. 3-4 is optimal for speed/quality balance. |
| speed | 1.0 | Playback speed. Lower = slower speech. |
| return_smooth | False | Smoother output. Use True if you hear metallic sounds. |
| ref_duration | 5 | Reference clip duration. Lower = faster. Set 1000 if artifacts. |
Use Cases
Video Tutorials
Generate voiceovers for educational content with consistent brand voice
Podcast Intros
Create professional intros and outros without recording sessions
Product Demos
Narrate product walkthroughs in multiple languages
Customer Support
Generate audio responses for IVR systems and help docs
Content at Scale
Convert blog posts to audio articles automatically
Accessibility
Add audio versions to written content for visually impaired users
How LuxTTS Compares
| Feature | LuxTTS | ElevenLabs | Coqui TTS |
|---|---|---|---|
| Price | Free (open source) | $5-99/mo | Free (open source) |
| Quality | 48kHz | 44.1kHz | 24kHz |
| Speed | 150x realtime | API-dependent | 10-50x realtime |
| VRAM | <1GB | Cloud-based | 2-4GB |
| Self-hosted | Yes | No | Yes |
| Voice cloning | 3s sample | 30s+ sample | 5s+ sample |
Tips
- 1.Use at minimum a 3-second audio file for voice cloning. Longer samples improve accuracy.
- 2.If you hear metallic sounds, set
return_smooth=True. - 3.Lower
t_shiftfor fewer pronunciation errors (at the cost of quality). - 4.Float16 inference (coming soon) will nearly double speed.
Want AI to handle the whole pipeline?
Our AI Brain Pro includes voice cloning integration, content generation, and automated publishing.
Get AI Brain Pro — $97