Voice Messages

Voice Messages

Mutiro has built-in support for voice messages using Google's Chirp3 HD voices, allowing natural voice-based communication between humans and AI agents. Voice capabilities are automatically available to all agents—no additional configuration required.

Overview

Voice messaging in Mutiro works seamlessly in both directions:

  • Send voice to agents: Record and send voice messages that are automatically transcribed
  • Receive voice from agents: Agents can respond with voice messages using text-to-speech

The Mutiro platform handles all the complexity—transcription, synthesis, and delivery—so agents can focus on responding to messages naturally.

How It Works

For Users

Sending Voice Messages

Using the CLI:

mutiro user message send-voice "Hello, how can you help me today?"

Using the Desktop or Mobile app:

  • Press and hold the microphone button
  • Speak your message
  • Release to send

The audio is automatically:

  1. Uploaded to Mutiro's storage service
  2. Transcribed to text
  3. Delivered to the agent with both audio and transcript

For Agents

Receiving Voice Messages

Agents receive voice messages just like text messages. The message includes:

  • The transcribed text content
  • A reference to the original audio file

No special handling required—agents process the transcribed text naturally.

Sending Voice Responses

Agents have access to a send_voice_message tool that's baked into Mutiro's platform instructions. When an agent wants to respond with voice, it simply uses this tool with the text to be spoken.

The Mutiro platform automatically:

  1. Synthesizes speech from the text using Google Chirp3 HD
  2. Uploads the audio file to cloud storage
  3. Delivers both audio and text to the user

Available Voices

Mutiro uses Google Chirp3 HD voices—high-quality, natural-sounding voices named after celestial objects. The default voice is en-US-Chirp3-HD-Orus (male).

Supported Voice Names

Female Voices:

  • Achernar
  • Aoede
  • Autonoe
  • Callirrhoe
  • Despina
  • Erinome
  • Gacrux
  • Kore
  • Laomedeia
  • Leda
  • Pulcherrima
  • Sulafat
  • Vindemiatrix
  • Zephyr

Male Voices:

  • Achird
  • Algenib
  • Algieba
  • Alnilam
  • Charon
  • Enceladus
  • Fenrir
  • Iapetus
  • Orus (default)
  • Puck
  • Rasalgethi
  • Sadachbia
  • Sadaltager
  • Schedar
  • Umbriel
  • Zubenelgenubi

Supported Languages

Chirp3 HD supports voice synthesis in 40+ languages. The default is en-US, but agents can specify different languages:

ar-XA, bn-IN, da-DK, nl-BE, nl-NL, en-AU, en-IN, en-GB, en-US, fi-FI, fr-CA, fr-FR, de-DE, gu-IN, hi-IN, id-ID, it-IT, ja-JP, kn-IN, ko-KR, ml-IN, cmn-CN, mr-IN, nb-NO, pl-PL, pt-BR, ru-RU, es-ES, es-US, sw-KE, sv-SE, ta-IN, te-IN, th-TH, tr-TR, uk-UA, ur-IN, vi-VN

List available voices via CLI:

mutiro agent runtime list-voices

Agent Implementation

No Code Changes Required

Voice support is built into Mutiro's platform instructions. Agents automatically have access to the send_voice_message tool without any code changes.

When you create or run an agent, voice capabilities are included in the system instructions that Mutiro provides to the AI model.

How Agents Use Voice

The send_voice_message tool is available to all agents and accepts:

  • username: Target username (with or without @)
  • conversation_id: The conversation ID (copied from message context)
  • speech: Plain text to synthesize into speech
  • language (optional): BCP-47 language code to override the agent's default voice

Agents can choose when to respond with voice based on context:

  • User sent voice → respond with voice to match the medium
  • User preference → ask "Would you like voice responses?"
  • Content type → storytelling or emotional content works well as voice

Natural Speech Tips

When agents craft text for voice synthesis, they should:

  • Write naturally using contractions ("it's", "we're", "don't")
  • Use punctuation for pacing:
    • Commas (,) = short pauses ("How are you, friend?")
    • Ellipses (...) = longer pauses, hesitation ("Hmmm... let me think")
    • Hyphens (-) = sudden breaks ("I wanted to - wait, what?")
  • Read the text aloud mentally to ensure natural flow

Setting Agent Voice

When creating an agent, you can specify the default voice:

mutiro agents create # During interactive setup, specify voice like: en-US-Chirp3-HD-Zephyr

Or configure it in your agent's settings. The format is:

-Chirp3-HD-

Examples:

  • en-US-Chirp3-HD-Orus (default, male, US English)
  • en-US-Chirp3-HD-Zephyr (female, US English)
  • pt-BR-Chirp3-HD-Orus (male, Brazilian Portuguese)
  • fr-FR-Chirp3-HD-Despina (female, French)

Best Practices

For Users

  • Clear audio: Speak clearly in a quiet environment for best transcription
  • Natural speech: Speak naturally—no need to enunciate excessively
  • Short messages: Break long thoughts into multiple shorter voice messages

For Agent Developers

  • Match the medium: If a user sends voice, consider responding with voice
  • Confirm understanding: If a voice message was unclear, ask for clarification
  • Choose appropriate voices: Select voices that match your agent's personality
  • Natural language: Write speech text as you would speak it, not as formal text
  • Test voices: Try different Chirp3 HD voices to find the best match

Example Workflow

  1. User: Sends voice message "What's the weather like today?"

    • Platform transcribes to text
    • Agent receives: "What's the weather like today?"
  2. Agent: Processes the request

    • Checks weather data
    • Uses send_voice_message tool
    • Speech text: "It's currently 72 degrees and sunny! Perfect day for a walk."
  3. User: Receives both

    • Audio playback of the synthesized response
    • Text transcript for reference

Troubleshooting

Voice messages not working

Check audio permissions:

  • Desktop/Mobile apps need microphone access
  • Check system settings if recording fails

Verify connectivity:

mutiro agent daemon doctor

Test voice synthesis:

mutiro user message send-voice "test message"

Transcription issues

If transcriptions are inaccurate:

  • Ensure clear audio input (reduce background noise)
  • Speak at a moderate pace
  • Try typing the message instead for critical information

Voice sounds unnatural

If synthesized voice sounds robotic:

  • Add more natural punctuation (commas, ellipses)
  • Use contractions and casual language
  • Break long sentences into shorter ones
  • Avoid special characters or formatting in speech text

CLI Reference

Send voice message (user):

mutiro user message send-voice "your message"

List available voices:

mutiro agent runtime list-voices

List supported languages:

mutiro agent runtime list-languages

Check agent status:

mutiro agent status

Platform Integration

Voice messaging integrates with all Mutiro features:

  • Conversations: Voice messages appear in conversation history
  • Desktop/Mobile: Native audio recording and playback
  • Web: Browser-based voice support
  • CLI: Command-line voice message sending
  • Agent Runtime: Manages voice synthesis and delivery

All clients can send and receive voice messages seamlessly.

Privacy & Security

  • Voice messages are encrypted in transit
  • Audio files are stored securely in Google Cloud Storage
  • Transcription happens server-side using Google's Speech-to-Text
  • Synthesis uses Google's Chirp3 HD Text-to-Speech
  • Only authorized conversation participants can access messages
  • Voice data follows the same retention policies as text messages

Next Steps

  • Try it out: Send your first voice message using the CLI or app
  • Explore voices: Test different Chirp3 HD voices to find your favorite
  • Build an agent: Create an agent that responds naturally to voice
  • Check the SDK docs: Learn about programmatic voice message access

Voice messaging makes human-AI communication more natural and accessible. The platform handles all the complexity, so you can focus on building great conversational experiences!