Open Source · Apache-2.0 · 3.5K Stars

Voice Cloning with LuxTTS

Clone any voice from a 3-second sample. 150x realtime speed, 48kHz crystal-clear output, under 1GB VRAM. The fastest open-source TTS model available.

150x

Realtime Speed

48kHz

Audio Quality

<1GB

VRAM Required

3sec

Min Reference

Quick Start

# Clone and install

git clone https://github.com/ysharma3501/LuxTTS.git

cd LuxTTS

pip install -r requirements.txt

# Load model (choose your device)

from zipvoice.luxvoice import LuxTTS

lux_tts = LuxTTS('YatharthS/LuxTTS', device='cuda')

# or device='cpu' | device='mps' (Mac)

Generate Speech in 4 Lines

import soundfile as sf

# Encode reference voice (min 3 seconds)

encoded = lux_tts.encode_prompt('reference.wav', rms=0.01)

# Generate speech

wav = lux_tts.generate_speech("Hello world!", encoded, num_steps=4)

# Save (48kHz)

sf.write('output.wav', wav.numpy().squeeze(), 48000)

Tuning Parameters

Parameter	Default	Description
rms	0.01	Volume level. Higher = louder. 0.01 recommended.
t_shift	0.9	Sampling quality. Higher = better sound, more pronunciation errors.
num_steps	4	Quality steps. 3-4 is optimal for speed/quality balance.
speed	1.0	Playback speed. Lower = slower speech.
return_smooth	False	Smoother output. Use True if you hear metallic sounds.
ref_duration	5	Reference clip duration. Lower = faster. Set 1000 if artifacts.

Use Cases

Video Tutorials

Generate voiceovers for educational content with consistent brand voice

Podcast Intros

Create professional intros and outros without recording sessions

Product Demos

Narrate product walkthroughs in multiple languages

Customer Support

Generate audio responses for IVR systems and help docs

Content at Scale

Convert blog posts to audio articles automatically

Accessibility

Add audio versions to written content for visually impaired users

How LuxTTS Compares

Feature	LuxTTS	ElevenLabs	Coqui TTS
Price	Free (open source)	$5-99/mo	Free (open source)
Quality	48kHz	44.1kHz	24kHz
Speed	150x realtime	API-dependent	10-50x realtime
VRAM	<1GB	Cloud-based	2-4GB
Self-hosted	Yes	No	Yes
Voice cloning	3s sample	30s+ sample	5s+ sample

Tips

1.Use at minimum a 3-second audio file for voice cloning. Longer samples improve accuracy.
2.If you hear metallic sounds, set return_smooth=True.
3.Lower t_shift for fewer pronunciation errors (at the cost of quality).
4.Float16 inference (coming soon) will nearly double speed.

Want AI to handle the whole pipeline?

Our AI Brain Pro includes voice cloning integration, content generation, and automated publishing.

Get AI Brain Pro — $67

Tuning Parameters

Parameter	Default	Description
rms	0.01	Volume level. Higher = louder. 0.01 recommended.
t_shift	0.9	Sampling quality. Higher = better sound, more pronunciation errors.
num_steps	4	Quality steps. 3-4 is optimal for speed/quality balance.
speed	1.0	Playback speed. Lower = slower speech.
return_smooth	False	Smoother output. Use True if you hear metallic sounds.
ref_duration	5	Reference clip duration. Lower = faster. Set 1000 if artifacts.

Use Cases

Video Tutorials

Generate voiceovers for educational content with consistent brand voice

Podcast Intros

Create professional intros and outros without recording sessions

Product Demos

Narrate product walkthroughs in multiple languages

Customer Support

Generate audio responses for IVR systems and help docs

Content at Scale

Convert blog posts to audio articles automatically

Accessibility

Add audio versions to written content for visually impaired users

Feature

LuxTTS

ElevenLabs

Coqui TTS

Price

Free (open source)

$5-99/mo

Free (open source)

Quality

48kHz

44.1kHz

24kHz

Speed

150x realtime

API-dependent

10-50x realtime

VRAM

<1GB

Cloud-based

2-4GB

Self-hosted

Yes

Voice cloning

3s sample

30s+ sample

5s+ sample