Transcribe
speech → text via whisper — 5 tiers, kv-cached merged decoder, pure rust on homelab
transcribe
turbo is the recommended default — distilled from large-v3, gives v3-grade accuracy at roughly 2× the throughput thanks to a 4-layer decoder (vs v3's 32). for the raw quality ceiling pick
auto-detect runs one extra decoder step with just
$ drag an audio file here
$ or click to browse
large-v3; for latency-first pick tiny or base.
[sot] then picks the language-token argmax. pin a language if you know it — slightly faster + avoids miss-detection on short clips.
[ idle ]