Transcribe

speech → text via whisper — 5 tiers, kv-cached merged decoder, pure rust on homelab

transcribe

transcribe drop an audio file (mp3 / aac / m4a / flac / wav / ogg-vorbis):

$ drag an audio file here $ or click to browse

model:

turbo is the recommended default — distilled from large-v3, gives v3-grade accuracy at roughly 2× the throughput thanks to a 4-layer decoder (vs v3's 32). for the raw quality ceiling pick large-v3; for latency-first pick tiny or base.

language:

auto-detect runs one extra decoder step with just [sot] then picks the language-token argmax. pin a language if you know it — slightly faster + avoids miss-detection on short clips.

translate to english (instead of transcribing source language)

[ idle ]