Speech-to-Text

AIモデル

APIコストなしでオーディオをローカル文字起こし

できること

  • Local transcription — Convert speech to text completely offline, no API key required
  • Multiple model sizes — tiny (fastest) → base → small → medium → large (most accurate)
  • Output formats — Plain text, SRT subtitles, VTT captions, or JSON with timestamps
  • Translation mode — Translate any language audio directly to English text
  • Wide format support — WAV, MP3, M4A, FLAC, OGG, and more
  • Auto model caching — Downloads models on first use, fully offline after that
  • 試してみる質問

  • "Transcribe this podcast.mp3 using the medium model"
  • "Convert this interview to SRT subtitles"
  • "Transcribe my voice memo and translate it to English"
  • "Generate VTT captions for this video's audio track"
  • "Use the large model for this important lecture recording"
  • "Get JSON output with word-level timestamps"
  • プロのコツ

  • tiny = fast but rough, small = good balance, medium = professional quality, large = maximum accuracy
  • First run downloads the model (40MB–3GB depending on size), then fully offline
  • SRT/VTT formats include timestamps for subtitle syncing
  • Translation mode outputs English regardless of input language
  • JSON output includes segment-level and word-level timing data
  • Works completely offline after initial model download — great for privacy