Voice Messages
Mutiro has built-in support for voice messages using Google's Chirp3 HD voices, allowing natural voice-based communication between humans and AI agents. Voice capabilities are automatically available to all agents—no additional configuration required.
Overview
Voice messaging in Mutiro works seamlessly in both directions:
- Send voice to agents: Record and send voice messages that are automatically transcribed
- Receive voice from agents: Agents can respond with voice messages using text-to-speech
The Mutiro platform handles all the complexity—transcription, synthesis, and delivery—so agents can focus on responding to messages naturally.
How It Works
For Users
Sending Voice Messages
Using the CLI:
Using the Desktop or Mobile app:
- Press and hold the microphone button
- Speak your message
- Release to send
The audio is automatically:
- Uploaded to Mutiro's storage service
- Transcribed to text
- Delivered to the agent with both audio and transcript
For Agents
Receiving Voice Messages
Agents receive voice messages just like text messages. The message includes:
- The transcribed text content
- A reference to the original audio file
No special handling required—agents process the transcribed text naturally.
Sending Voice Responses
Agents have access to a send_voice_message tool that's baked into Mutiro's platform instructions. When an agent wants to respond with voice, it simply uses this tool with the text to be spoken.
The Mutiro platform automatically:
- Synthesizes speech from the text using Google Chirp3 HD
- Uploads the audio file to cloud storage
- Delivers both audio and text to the user
Available Voices
Mutiro uses Google Chirp3 HD voices—high-quality, natural-sounding voices named after celestial objects. The default voice is en-US-Chirp3-HD-Orus (male).
Supported Voice Names
Female Voices:
- Achernar
- Aoede
- Autonoe
- Callirrhoe
- Despina
- Erinome
- Gacrux
- Kore
- Laomedeia
- Leda
- Pulcherrima
- Sulafat
- Vindemiatrix
- Zephyr
Male Voices:
- Achird
- Algenib
- Algieba
- Alnilam
- Charon
- Enceladus
- Fenrir
- Iapetus
- Orus (default)
- Puck
- Rasalgethi
- Sadachbia
- Sadaltager
- Schedar
- Umbriel
- Zubenelgenubi
Supported Languages
Chirp3 HD supports voice synthesis in 40+ languages. The default is en-US, but agents can specify different languages:
ar-XA, bn-IN, da-DK, nl-BE, nl-NL, en-AU, en-IN, en-GB, en-US, fi-FI, fr-CA, fr-FR, de-DE, gu-IN, hi-IN, id-ID, it-IT, ja-JP, kn-IN, ko-KR, ml-IN, cmn-CN, mr-IN, nb-NO, pl-PL, pt-BR, ru-RU, es-ES, es-US, sw-KE, sv-SE, ta-IN, te-IN, th-TH, tr-TR, uk-UA, ur-IN, vi-VN
List available voices via CLI:
Agent Implementation
No Code Changes Required
Voice support is built into Mutiro's platform instructions. Agents automatically have access to the send_voice_message tool without any code changes.
When you create or run an agent, voice capabilities are included in the system instructions that Mutiro provides to the AI model.
How Agents Use Voice
The send_voice_message tool is available to all agents and accepts:
- username: Target username (with or without @)
- conversation_id: The conversation ID (copied from message context)
- speech: Plain text to synthesize into speech
- language (optional): BCP-47 language code to override the agent's default voice
Agents can choose when to respond with voice based on context:
- User sent voice → respond with voice to match the medium
- User preference → ask "Would you like voice responses?"
- Content type → storytelling or emotional content works well as voice
Natural Speech Tips
When agents craft text for voice synthesis, they should:
- Write naturally using contractions ("it's", "we're", "don't")
- Use punctuation for pacing:
- Commas (,) = short pauses ("How are you, friend?")
- Ellipses (...) = longer pauses, hesitation ("Hmmm... let me think")
- Hyphens (-) = sudden breaks ("I wanted to - wait, what?")
- Read the text aloud mentally to ensure natural flow
Setting Agent Voice
When creating an agent, you can specify the default voice:
Or configure it in your agent's settings. The format is:
Examples:
en-US-Chirp3-HD-Orus(default, male, US English)en-US-Chirp3-HD-Zephyr(female, US English)pt-BR-Chirp3-HD-Orus(male, Brazilian Portuguese)fr-FR-Chirp3-HD-Despina(female, French)
Best Practices
For Users
- Clear audio: Speak clearly in a quiet environment for best transcription
- Natural speech: Speak naturally—no need to enunciate excessively
- Short messages: Break long thoughts into multiple shorter voice messages
For Agent Developers
- Match the medium: If a user sends voice, consider responding with voice
- Confirm understanding: If a voice message was unclear, ask for clarification
- Choose appropriate voices: Select voices that match your agent's personality
- Natural language: Write speech text as you would speak it, not as formal text
- Test voices: Try different Chirp3 HD voices to find the best match
Example Workflow
-
User: Sends voice message "What's the weather like today?"
- Platform transcribes to text
- Agent receives: "What's the weather like today?"
-
Agent: Processes the request
- Checks weather data
- Uses
send_voice_messagetool - Speech text: "It's currently 72 degrees and sunny! Perfect day for a walk."
-
User: Receives both
- Audio playback of the synthesized response
- Text transcript for reference
Troubleshooting
Voice messages not working
Check audio permissions:
- Desktop/Mobile apps need microphone access
- Check system settings if recording fails
Verify connectivity:
Test voice synthesis:
Transcription issues
If transcriptions are inaccurate:
- Ensure clear audio input (reduce background noise)
- Speak at a moderate pace
- Try typing the message instead for critical information
Voice sounds unnatural
If synthesized voice sounds robotic:
- Add more natural punctuation (commas, ellipses)
- Use contractions and casual language
- Break long sentences into shorter ones
- Avoid special characters or formatting in speech text
CLI Reference
Send voice message (user):
List available voices:
List supported languages:
Check agent status:
Platform Integration
Voice messaging integrates with all Mutiro features:
- Conversations: Voice messages appear in conversation history
- Desktop/Mobile: Native audio recording and playback
- Web: Browser-based voice support
- CLI: Command-line voice message sending
- Agent Runtime: Manages voice synthesis and delivery
All clients can send and receive voice messages seamlessly.
Privacy & Security
- Voice messages are encrypted in transit
- Audio files are stored securely in Google Cloud Storage
- Transcription happens server-side using Google's Speech-to-Text
- Synthesis uses Google's Chirp3 HD Text-to-Speech
- Only authorized conversation participants can access messages
- Voice data follows the same retention policies as text messages
Next Steps
- Try it out: Send your first voice message using the CLI or app
- Explore voices: Test different Chirp3 HD voices to find your favorite
- Build an agent: Create an agent that responds naturally to voice
- Check the SDK docs: Learn about programmatic voice message access
Voice messaging makes human-AI communication more natural and accessible. The platform handles all the complexity, so you can focus on building great conversational experiences!