Free Audio Transcription with Insanely Fast Whisper
Transcribe 150 minutes of audio in 98 seconds. Zero cost, fully local, with speaker diarization and 99-language translation. The fastest Whisper implementation available.
Why Pay for Transcription?
Most transcription services charge per minute. Insanely Fast Whisper runs locally on your GPU for free.
| Service | Per Minute | Per Hour | Note |
|---|---|---|---|
| OpenAI Whisper API | $0.006/min | $0.36/hr | Cloud-dependent |
| Google Speech-to-Text | $0.024/min | $1.44/hr | Premium tier |
| Rev (Human) | $1.50/min | $90.00/hr | Manual turnaround |
| Otter.ai | ~$8.33/mo | $100/yr | Subscription required |
| AWS Transcribe | $0.024/min | $1.44/hr | Cloud-dependent |
| Insanely Fast Whisper | $0 | $0 | Open source, local |
Quick Start
Speaker Diarization
Automatically identify who said what. Essential for interviews, meetings, and multi-speaker podcasts.
Example Output
Translation
Transcribe and translate audio from 99 languages into English in a single pass.
How It Compares
| Feature | Insanely Fast Whisper | OpenAI Whisper API | Google STT | AWS Transcribe |
|---|---|---|---|---|
| Accuracy | Large-v3 (best) | Large-v2 | Varies by model | Custom models |
| Speed | 150min in 98sec | API-limited | API-limited | Near realtime |
| Cost | Free (local GPU) | $0.006/min | $0.024/min | $0.024/min |
| Privacy | Fully local | Cloud upload | Cloud upload | Cloud upload |
| Speaker ID | Yes (diarize) | No | Yes | Yes |
| Translation | 99 languages | No | No | No |
| Self-hosted | Yes | No | No | No |
| Batch processing | Yes | Limited | Yes | Yes |
Use Cases
Podcasters
Generate full episode transcripts and show notes in seconds, not hours
Journalists
Transcribe interviews with speaker labels to identify who said what
Researchers
Process hundreds of hours of recorded interviews for qualitative analysis
Lawyers
Transcribe depositions, hearings, and client calls with zero cloud exposure
Content Creators
Turn YouTube videos into blog posts, captions, and social content
Students
Convert lecture recordings into searchable, study-ready text notes
Tips
- 1.Use a GPU with at least 10GB VRAM for large-v3. For smaller GPUs, use distil-large-v3 — nearly the same accuracy at 6x speed.
- 2.For long files (2hr+), use flash attention 2 with --batch-size 24 to maximize throughput without running out of memory.
- 3.Speaker diarization requires a Hugging Face token. Accept the terms for pyannote/speaker-diarization-3.1 and pyannote/segmentation-3.0 on the Hub first.
- 4.Always specify --language if you know the source language. Auto-detection works but wastes the first 30 seconds on detection.
- 5.Pipe output to jq for clean JSON processing: add --transcript-path output.json then parse with jq.
- 6.For maximum accuracy on noisy audio, preprocess with ffmpeg: ffmpeg -i input.mp3 -af "highpass=f=200,lowpass=f=3000" clean.wav
Want AI to automate the full pipeline?
Our AI Brain Pro includes transcription workflows, content repurposing, and automated publishing — all integrated.
Get AI Brain Pro — $97