Voice Pairing Guide

Talk to your AI agent with voice - both ways

Voice Features: Speak to your agent and hear responses. Voice transcription is automatic, and you can customize the voice that responds to you.

Prerequisites

Before starting, make sure you have:

1

Configure Your Agent's Voice

Open your agent's configuration file in your project directory:

cd ~/your-project open .mutiro-agent.yaml

Add the tts_voice field to your configuration:

name: Claude Assistant engine: claude tts_voice: "en-US-Chirp3-HD-Algieba"

About Voice Selection:

Mutiro uses Google's Chirp3 HD voices for natural-sounding speech. Choose from dozens of voices in different languages, genders, and styles. See available voices below.

2

Choose Your Voice

Select a voice that matches your agent's personality. Here are some popular options:

Algieba (Male)
English (US)
en-US-Chirp3-HD-Algieba
Kore (Female)
English (US)
en-US-Chirp3-HD-Kore
Charon (Male)
English (US)
en-US-Chirp3-HD-Charon
Leda (Female)
English (US)
en-US-Chirp3-HD-Leda
Puck (Male)
English (US)
en-US-Chirp3-HD-Puck
Zephyr (Female)
English (US)
en-US-Chirp3-HD-Zephyr

More Voices Available:

Chirp3 HD supports voices in multiple languages including Spanish, French, German, Japanese, and more. Each voice has unique characteristics - some are warm and friendly, others are clear and professional.

View the complete voice catalog with audio samples: Google Cloud Chirp3 HD Voices →

3

Restart Your Agent

After updating your configuration, restart the agent daemon to apply the new voice:

mutiro start

Your agent will now use the configured voice for all audio responses.

4

Test Voice Interaction

Open the Mutiro mobile app and try voice messaging:

  1. 1 Tap the microphone button in the chat interface
  2. 2 Speak your message to your agent
  3. 3 Your voice is automatically transcribed and sent to the agent
  4. 4 The agent's response comes back as both text and audio

Automatic Transcription:

Mutiro automatically transcribes your voice messages using advanced speech recognition. You don't need any additional configuration - it just works!

5

Customize Voice Responses (Optional)

You can customize how your agent responds to voice messages by adding instructions to your agent configuration:

Option 1: Using Prompt Append (Claude)

Add a prompt_append field to guide Claude's responses:

name: Claude Assistant engine: claude tts_voice: "en-US-Chirp3-HD-Algieba" prompt_append: | MUTIRO INTERACTION CONTEXT: - Mutiro is a messaging app like WhatsApp - conversational and mobile-friendly - Users are often on-the-go, driving, or multitasking - NEVER mix voice and text in one response - choose ONE mode WHEN TO USE VOICE (wrap in <voice> tags): - Explanations, discussions, brainstorming - Teaching or walking through concepts - Natural conversations and back-and-forth dialogue - Anything that benefits from a conversational, human tone WHEN TO USE TEXT (no voice tags): - Quick factual answers (commands, URLs, short status updates) - Code snippets or diffs - Lists of items or configuration values - Brief confirmations or acknowledgments Example voice response (explaining a concept): <voice>So here's how authentication works in your app. When a user logs in, we first check their credentials against the database. If they match, we generate a JWT token that includes their user ID and permissions. This token gets sent back to the client and stored in local storage. Then, for every subsequent request, the client sends that token in the authorization header, and our middleware validates it before allowing access to protected routes. Make sense?</voice> Example text response (quick info): Deployment URL: https://your-app-abc123.vercel.app Status: Live

Option 2: Using Genie Persona

For more advanced customization, create a Genie persona that defines your agent's voice personality:

# .genie/personas/voice_assistant/prompt.yaml name: Voice Assistant instruction: | You are a helpful AI assistant communicating through Mutiro, a messaging app like WhatsApp. Users are often mobile, multitasking, or on-the-go. CRITICAL RULE - CHOOSE ONE MODE PER RESPONSE: - Use VOICE for explanations, discussions, teaching, brainstorming - Use TEXT for brief factual info, code, commands, quick status updates - NEVER mix voice tags and text in the same response VOICE RESPONSES (wrap in <voice> tags): - Natural, conversational tone like talking to a colleague - Can be longer - multiple sentences or even paragraphs - Great for explaining concepts, walking through ideas - Use when the conversation is exploratory or educational TEXT RESPONSES (no voice tags): - Keep it brief - this is like texting - Perfect for: URLs, code snippets, command outputs, lists - Avoid long text messages or large file diffs - Think WhatsApp-style brevity Example voice response (brainstorming): <voice>Okay, so I'm thinking about your architecture question. You could go with a monolithic approach initially, which would be simpler to deploy and manage. But if you're expecting to scale quickly, a microservices setup might save you pain later. The trade-off is complexity upfront versus flexibility down the road. Given your team size and timeline, I'd lean toward starting monolithic and breaking things out later when you hit actual scaling issues. What do you think?</voice> Example text response (quick command): Run this to fix the build: npm cache clean --force && npm install

Mutiro is Like WhatsApp for AI:

Think of Mutiro as a messaging app where you chat with your AI agent. Use voice for conversations - explaining ideas, brainstorming, discussing concepts. These can be longer and more natural. Use text for quick, practical exchanges - URLs, short code snippets, commands, status updates.

Important: Never mix both modes in one response. Avoid sending huge text blocks or file diffs (document support coming later). Keep text responses WhatsApp-brief, and use voice when you need to actually explain or explore something.

Voice is Ready!

You can now have natural voice conversations with your AI agent from anywhere.

Pro Tips:

  • Try different voices to find one that matches your agent's personality
  • Customize voice behavior with prompt_append or Genie personas for better voice interactions
  • Voice works in all languages supported by Chirp3 HD - experiment with multilingual agents
  • Each agent can have a different voice - customize each one individually
Agent Daemon Guide Quick Start Guide