Local models require Python backend server to be running. See the GitHub repository for setup instructions.
OpenAI API Key
Deepgram API Key
Gladia API Key
ElevenLabs API Key
Deepgram AI Features (English Only)
Short summary for fast insight.
Labels main topics for quick understanding.
Transcription Prompt
Price Calculator
File Duration:N/A
API Base Cost:N/A
Summarization:$0.00
Topic Detection:$0.00
Total Estimated:$0.00
Media Preview
0:000:00
Transcript
Default
Default
#
SRT
VTT
TSV
Summary
Topics
Default (.txt)
Numbered (.txt)
SRT (.srt)
VTT (.vtt)
TSV (.tsv)
Summary (.txt)
Topics (.txt)
Copy to clipboard
Processing Transcript
Summary
Detected Topics
Model Information
Local Models (Whisper)
Tiny
Fastest, but least accurate.
VRAM req. ~1 GB
Base
Enhanced accuracy over Tiny.
VRAM req. ~1 GB
Small
Balanced speed and performance.
VRAM req. ~2 GB
Medium
Improved nuance and accent handling.
VRAM req. ~4 GB
Turbo
Second best quality.
VRAM req. ~6 GB
Large
Best quality, but very slow. A regular computer may struggle to run this.
VRAM req. ~10 GB
Whisper is OpenAI's open-source general-purpose speech recognition model. The available local models offer tradeoffs between transcription accuracy and processing speed. For an in-depth overview, refer to the Whisper Documentation.
Cloud Models
Whisper-1
OpenAI's original cloud transcription model, still good and cheap. Supports native timestamps.
Offers $5 free (API) credits (~14 hours of transcription).
4o Mini
Uses GPT-4o mini to transcribe audio. Supports prompting. Does not support native timestamps.
Half the price of 4o.
4o
Uses GPT-4o to transcribe audio. Supports prompting. Does not support native timestamps.
Most expensive model.
Nova-3
Deepgram's cloud model with multi-language support. Offers advanced AI features like summarization and topic detection for English content.
Offers $200 free (API) credits (~640 hours of transcription).
Gladia v2
Gladia with automatic language detection and multi-language support.
Offers 10 hours of free (API) credits per month.
Scribe-v1
ElevenLabs' speech-to-text model supporting 99 languages.
Offers 10,000 free (API) credits per month (~4,87 hours of transcription).