Security Guidelines

Best practices for deploying AI agents securely

Mutiro enables powerful AI agent workflows, but with great power comes great responsibility. This guide outlines critical security practices to protect yourself, your data, and your systems when deploying AI agents.

The Lethal Trifecta

As outlined by Simon Willison in "The Lethal Trifecta" , the most dangerous security vulnerability occurs when an AI agent has all three capabilities:

  1. Ingesting untrusted data (query the web, read emails, process user input)
  2. Taking actions (send emails, make API calls, execute commands)
  3. No human oversight (autonomous operation)

When these three capabilities combine in a single agent, you create a vector for prompt injection attacks and unintended actions that could compromise your systems, leak sensitive data, or cause financial damage.

Why We Built Mutiro

One of the key challenges in deploying secure AI agents is implementing practical human-in-the-loop workflows. When you separate intake and action agents (as recommended in this guide), you need a simple way to review findings from one agent and decide whether to pass them to another.

The messaging-based approach:

Traditional agent frameworks often use complex approval APIs or require custom integration code. Mutiro treats agents as conversational participants. They're just contacts in your messaging app. This makes the human review step as simple as reading a message and deciding whether to forward it.

Example workflow:

  1. Your research_agent finds something and messages you
  2. You review it on your phone (or desktop)
  3. If approved, you forward the message to your action_agent
  4. The action_agent receives only what you explicitly sent

What this enables:

  • Review agent findings on mobile (iOS/Android) or in the terminal (TUI)
  • Forward messages between agents without writing code
  • Maintain a searchable conversation history of all agent interactions
  • Control information flow between agents at the message level
  • Use mobile when on the go, terminal when at your desk

Important: Agent-to-agent communication is controlled by the tools you configure (e.g., via MCPs or custom integrations). The messaging interface is one way to implement human oversight, but agents can have direct communication channels if you configure them that way. The security best practices in this document apply regardless of how you connect your agents.

1

Separate Intake and Action Agents

DANGEROUS: Single Agent with Full Access

# Don't do this - one agent with both intake and action capabilities mutiro agents create dangerous_agent "Dangerous Agent" --engine genie # Then configure it with BOTH web search AND email sending tools # (This creates a prompt injection risk)

An attacker could inject malicious prompts through web content, causing your agent to take harmful actions like sending unauthorized emails.

SAFE: Separate Agents with Human in the Loop

# Create an intake agent - configure it with only read/analysis tools mutiro agents create research_agent "Research Agent" --engine genie # In .mutiro-agent.yaml: Configure Genie persona with web_search, read tools only # Create an action agent - configure it with only execution tools mutiro agents create executor_agent "Executor Agent" --engine claude # In .mutiro-agent.yaml: Configure Claude with MCP servers for email, code execution

Workflow:

  1. research_agent ingests and analyzes untrusted data
  2. research_agent sends summary/recommendations to YOU
  3. YOU review and decide what action to take
  4. YOU forward approved information to executor_agent
  5. executor_agent performs the action

This creates a mandatory human checkpoint between data ingestion and action execution.

2

Principle of Least Privilege

Only grant agents the minimum permissions they need. This includes limiting both the tools they can use and who can communicate with them.

Tool Configuration

Configure agents with only the tools necessary for their role:

# Intake agent - configure with read-only tools mutiro agents create data_collector "Data Collector" --engine genie # In .mutiro-agent.yaml: Configure persona with web_search, read_documents only # Action agent - configure with limited, specific actions mutiro agents create task_executor "Task Executor" --engine claude # In .mutiro-agent.yaml: Configure MCP server for messaging only (no code execution)

Access Control with Allowlists

Mutiro agents include an allowlist feature that controls who can send messages to your agent. This is your first line of defense against unauthorized access.

Configure in .mutiro-agent.yaml:

agent: username: my_research_agent api_key: ${MUTIRO_API_KEY} # Specify which users can message this agent allowlist: - alice # Exact username - bob - team_* # Pattern: all usernames starting with team_ - "!spam_user" # Negation: block this specific user

Supported Patterns:

  • Exact match: alice matches only "alice"
  • Wildcard `*`: team_* matches "team_dev", "team_ops", etc.
  • Single char `?`: user_? matches "user_1", "user_a", but not "user_12"
  • Negation `!`: !spam_* blocks any username starting with "spam_"

BEST: Specific users only

allowlist: - alice - bob

GOOD: Team access with exceptions

allowlist: - team_* - "!team_intern"

DEFAULT: Owner-only (most secure)

# Omit allowlist field entirely # Only agent owner can message

CAUTION: Public access

allowlist: - "*" # Anyone can message

How Allowlists Protect You:

  • • Prevents unauthorized users from sending messages to your agent
  • • Blocks potential prompt injection from untrusted sources
  • • Failed access attempts are logged for security monitoring
  • • Default (no allowlist) = only agent owner can send messages
3

Understanding the Trust Spectrum

The strict separation between intake and action agents represents the most secure approach, but it's not always the most practical. In reality, there's a spectrum based on how much you trust your data sources:

High Trust

Internal APIs, your own DBs, trusted team

Some actions OK:

  • Send messages
  • Update records
  • Run queries

Medium Trust

Known contacts, verified sources, curated feeds

Limited actions:

  • Create tasks
  • Save drafts
  • Log events

Low Trust

Public web, general email, external APIs

Read-only + approval:

  • Analysis only
  • Report to human

Zero Trust

Anonymous input, user submissions, social media

NO actions:

  • Analysis only
  • Report to human
  • No external actions

Key principle:

The less you trust the data source, the fewer action capabilities the agent should have. When in doubt, default to the strictest separation.

4

Run Agents in Sandboxed Environments

Even with proper separation, agents should run in isolated, resource-limited environments to contain potential damage from compromised or misbehaving agents.

RECOMMENDED: Docker Container with Security Hardening

# Example 1: Analysis agent (NO network access - most secure) docker run -d \ --name mutiro-analysis-agent \ --network=none \ --security-opt=no-new-privileges \ --cap-drop=ALL \ --tmpfs /tmp:noexec,nosuid \ -e MUTIRO_API_KEY=your_api_key \ -e GEMINI_API_KEY=your_gemini_key \ -v /path/to/workspace:/workspace:ro \ mutirolabs/agent:genie # Example 2: Research agent (network access required - restricted) docker run -d \ --name mutiro-research-agent \ --network=bridge \ --security-opt=no-new-privileges \ --cap-drop=ALL \ --tmpfs /tmp:noexec,nosuid \ -e MUTIRO_API_KEY=your_api_key \ -e GEMINI_API_KEY=your_gemini_key \ -v /path/to/workspace:/workspace:ro \ mutirolabs/agent:genie # Example 3: Action agent (write access needed - use volume isolation) docker run -d \ --name mutiro-action-agent \ --network=bridge \ --security-opt=no-new-privileges \ --cap-drop=ALL \ --tmpfs /tmp:noexec,nosuid \ -e MUTIRO_API_KEY=your_api_key \ -e ANTHROPIC_API_KEY=your_anthropic_key \ -v /path/to/isolated/workspace:/workspace \ mutirolabs/agent:claude

Security Settings Explained:

--network=none

Completely isolate from network (use for analysis-only agents)

--network=bridge

Allow network access (use minimal permissions for research agents)

--security-opt=no-new-privileges

Prevent privilege escalation

--cap-drop=ALL

Remove all Linux capabilities (most restrictive)

--tmpfs /tmp:noexec,nosuid

Writable temp dir but no code execution

-v path:/workspace:ro

Read-only workspace prevents data modification

-v path:/workspace

Writable workspace only for trusted action agents

Best Practices Checklist

Agent Architecture

  • Never combine untrusted data ingestion with action capabilities in one agent
  • Always create separate intake and action agents
  • Use the principle of least privilege - minimum necessary permissions
  • Document your agent architecture and data flow

Sandboxing & Isolation

  • Run agents in Docker containers or other sandboxed environments
  • Restrict network access to only what's necessary
  • Use read-only filesystem mounts where possible
  • Apply security policies (no-new-privileges, seccomp)

Review & Monitoring

  • Review all agent recommendations before executing actions
  • Regularly audit agent capabilities and remove unnecessary tools
  • Monitor agent activity logs for suspicious patterns
  • Use Mutiro's mobile app to review agent requests when away from desk

Team & Training

  • Train team members on prompt injection risks
  • Establish approval processes for production agent deployments
  • Create incident response plan for compromised agents

Conclusion

AI agents are powerful tools, but they must be architected with security in mind. The key principle is simple: separate data ingestion from action execution, with human review in between.

Important Caveat:

Following these guidelines makes you more secure, not perfectly safe. Security is about reducing risk, not eliminating it. These practices significantly lower your exposure to prompt injection attacks and unintended consequences, but no system is 100% secure. Stay vigilant, monitor your agents, and continuously reassess your security posture as threats evolve.

Questions or Concerns?

If you discover a security issue with Mutiro itself, please report it to: security@mutiro.com

For questions about securing your agent deployments, visit our community forum or contact support@mutiro.com