Speech-to-Text

AI 모델

API 비용 없이 오디오를 로컬에서 전사

할 수 있는 것

  • Local transcription — Convert speech to text completely offline, no API key required
  • Multiple model sizes — tiny (fastest) → base → small → medium → large (most accurate)
  • Output formats — Plain text, SRT subtitles, VTT captions, or JSON with timestamps
  • Translation mode — Translate any language audio directly to English text
  • Wide format support — WAV, MP3, M4A, FLAC, OGG, and more
  • Auto model caching — Downloads models on first use, fully offline after that
  • 시도해볼 질문

  • "Transcribe this podcast.mp3 using the medium model"
  • "Convert this interview to SRT subtitles"
  • "Transcribe my voice memo and translate it to English"
  • "Generate VTT captions for this video's audio track"
  • "Use the large model for this important lecture recording"
  • "Get JSON output with word-level timestamps"
  • 전문가 팁

  • tiny = fast but rough, small = good balance, medium = professional quality, large = maximum accuracy
  • First run downloads the model (40MB–3GB depending on size), then fully offline
  • SRT/VTT formats include timestamps for subtitle syncing
  • Translation mode outputs English regardless of input language
  • JSON output includes segment-level and word-level timing data
  • Works completely offline after initial model download — great for privacy