Voice - Neotask by Neotask Documentation | Neotask

Voice

Overview

Open Claw multiple systems के माध्यम से voice interaction support करता है: wake word detection, continuous voice conversation (talk mode), और spoken responses के लिए text-to-speech।

Wake Words

Swabble (macOS)

Swabble एक native macOS daemon है जो Apple के Speech.framework का उपयोग करके always-on, on-device voice wake word detection provide करता है।

Features:

Local-only processing — wake word detection के दौरान audio आपके device से नहीं जाता

Default wake word: clawd (alias claude के साथ)

Customizable wake words

Continuous audio capture और transcription

Hook execution — wake word detect होने पर shell commands trigger करता है

File transcription — audio files को text में convert करें (TXT या SRT format)

Configurable cooldown, minimum character count, और timeout

यह कैसे काम करता है:

Swabble system microphone का उपयोग करके continuously listen करता है

जब यह spoken text में wake word detect करता है, तो following speech capture करता है

Captured text एक configured hook command के माध्यम से आपके agent को भेजा जाता है

Agent voice command process करता है और respond करता है

Node Wake Words

iOS और Android companion apps पर, voice wake natively handled होता है:

Wake word configuration Gateway के owned है

Nodes connect होने पर wake word config receive करते हैं

Detection platform-native speech recognition उपयोग करती है

Talk Mode

Talk mode continuous voice conversations enable करता है — naturally बोलें और अपने agent को respond करते सुनें।

यह कैसे काम करता है

Speech-to-Text — आपकी voice real-time में transcribed होती है (Deepgram streaming या platform-native STT)

Agent Processing — Transcribed text आपके agent को regular message के रूप में भेजा जाता है

Text-to-Speech — Agent का response आपको spoken back किया जाता है

Voice State Machine

Talk mode चार states के बीच transition करता है:

| State | विवरण | |-------|-------------| | Idle | Actively listen नहीं कर रहा | | Listening | आपकी speech capture और transcribe कर रहा है | | Thinking | Agent आपका request process कर रहा है | | Speaking | Agent response spoken जा रहा है |

Text-to-Speech Providers

| Provider | विवरण | |----------|-------------| | ElevenLabs | Voice selection के साथ High-quality voice synthesis | | OpenAI TTS | OpenAI का text-to-speech API |

Voice Preferences

Voice selection — Available TTS voices में से choose करें

Custom system prompt — Voice mode के लिए agent की personality override करें

Custom response format — Control करें कि agent spoken responses कैसे format करता है

Language support — 18+ languages के लिए Voice strings localized

Voice Commands

Multi-Intent Detection

Agents multi-step voice commands detect और execute कर सकते हैं:

> "कल 3 PM के लिए एक calendar event बनाओ, फिर team को इसके बारे में email भेजो, और Slack में एक reminder post करो"

यह automatically commands की एक sequence में parsed होता है, प्रत्येक order में execute होता है और results अगले step पर flow होते हैं।

Tool Execution

Voice conversations के दौरान, agents text conversations की तरह tools execute कर सकते हैं — web browse करें, code run करें, files manage करें, devices control करें, और अधिक। Results summarize और spoken back किए जाते हैं।

Action Truth Enforcement

Voice mode validation include करता है कि agent claims actual tool outcomes से match करते हैं। यदि agent कहता है "मैंने email भेज दिया" लेकिन email tool fail हुआ, तो system discrepancy catch करता है और actual result report करता है।

Voice Calling (Plugin)

Voice Call plugin SIP telephony support add करता है:

Inbound call handling

Outbound calls (provider-dependent)

Real-time bidirectional audio (PCM streams)

Call audio में injected TTS synthesis

Quota Management

Voice services में usage quotas हो सकते हैं:

TTS + STT के लिए Monthly minute allocation

Per-session tracking

80% usage पर Warning

Quota limit पर Automatic cutoff

View full documentation