AI Starter Package
Open Source · Apache-2.0 · 3.5K Stars

Voice Cloning with LuxTTS

Clone any voice from a 3-second sample. 150x realtime speed, 48kHz crystal-clear output, under 1GB VRAM. The fastest open-source TTS model available.

150x
Realtime Speed
48kHz
Audio Quality
<1GB
VRAM Required
3sec
Min Reference

Quick Start

# Clone and install
git clone https://github.com/ysharma3501/LuxTTS.git
cd LuxTTS
pip install -r requirements.txt
# Load model (choose your device)
from zipvoice.luxvoice import LuxTTS
lux_tts = LuxTTS('YatharthS/LuxTTS', device='cuda')
# or device='cpu' | device='mps' (Mac)

Generate Speech in 4 Lines

import soundfile as sf
# Encode reference voice (min 3 seconds)
encoded = lux_tts.encode_prompt('reference.wav', rms=0.01)
# Generate speech
wav = lux_tts.generate_speech("Hello world!", encoded, num_steps=4)
# Save (48kHz)
sf.write('output.wav', wav.numpy().squeeze(), 48000)

Tuning Parameters

ParameterDefaultDescription
rms0.01Volume level. Higher = louder. 0.01 recommended.
t_shift0.9Sampling quality. Higher = better sound, more pronunciation errors.
num_steps4Quality steps. 3-4 is optimal for speed/quality balance.
speed1.0Playback speed. Lower = slower speech.
return_smoothFalseSmoother output. Use True if you hear metallic sounds.
ref_duration5Reference clip duration. Lower = faster. Set 1000 if artifacts.

Use Cases

Video Tutorials

Generate voiceovers for educational content with consistent brand voice

Podcast Intros

Create professional intros and outros without recording sessions

Product Demos

Narrate product walkthroughs in multiple languages

Customer Support

Generate audio responses for IVR systems and help docs

Content at Scale

Convert blog posts to audio articles automatically

Accessibility

Add audio versions to written content for visually impaired users

How LuxTTS Compares

FeatureLuxTTSElevenLabsCoqui TTS
PriceFree (open source)$5-99/moFree (open source)
Quality48kHz44.1kHz24kHz
Speed150x realtimeAPI-dependent10-50x realtime
VRAM<1GBCloud-based2-4GB
Self-hostedYesNoYes
Voice cloning3s sample30s+ sample5s+ sample

Tips

  • 1.Use at minimum a 3-second audio file for voice cloning. Longer samples improve accuracy.
  • 2.If you hear metallic sounds, set return_smooth=True.
  • 3.Lower t_shift for fewer pronunciation errors (at the cost of quality).
  • 4.Float16 inference (coming soon) will nearly double speed.

Want AI to handle the whole pipeline?

Our AI Brain Pro includes voice cloning integration, content generation, and automated publishing.

Get AI Brain Pro — $97