Best practices for deploying AI agents securely
Mutiro enables powerful AI agent workflows, but with great power comes great responsibility. This guide outlines critical security practices to protect yourself, your data, and your systems when deploying AI agents.
The Lethal Trifecta
As outlined by Simon Willison in "The Lethal Trifecta" , the most dangerous security vulnerability occurs when an AI agent has all three capabilities:
- Ingesting untrusted data (query the web, read emails, process user input)
- Taking actions (send emails, make API calls, execute commands)
- No human oversight (autonomous operation)
When these three capabilities combine in a single agent, you create a vector for prompt injection attacks and unintended actions that could compromise your systems, leak sensitive data, or cause financial damage.
Why We Built Mutiro
One of the key challenges in deploying secure AI agents is implementing practical human-in-the-loop workflows. When you separate intake and action agents (as recommended in this guide), you need a simple way to review findings from one agent and decide whether to pass them to another.
The messaging-based approach:
Traditional agent frameworks often use complex approval APIs or require custom integration code. Mutiro treats agents as conversational participants. They're just contacts in your messaging app. This makes the human review step as simple as reading a message and deciding whether to forward it.
Example workflow:
- Your
research_agent
finds something and messages you - You review it on your phone (or desktop)
- If approved, you forward the message to your
action_agent
- The
action_agent
receives only what you explicitly sent
What this enables:
-
Review agent findings on mobile (iOS/Android) or in the terminal (TUI) -
Forward messages between agents without writing code -
Maintain a searchable conversation history of all agent interactions -
Control information flow between agents at the message level -
Use mobile when on the go, terminal when at your desk
Important: Agent-to-agent communication is controlled by the tools you configure (e.g., via MCPs or custom integrations). The messaging interface is one way to implement human oversight, but agents can have direct communication channels if you configure them that way. The security best practices in this document apply regardless of how you connect your agents.
Separate Intake and Action Agents
An attacker could inject malicious prompts through web content, causing your agent to take harmful actions like sending unauthorized emails.
Workflow:
research_agent
ingests and analyzes untrusted dataresearch_agent
sends summary/recommendations to YOU- YOU review and decide what action to take
- YOU forward approved information to
executor_agent
executor_agent
performs the action
This creates a mandatory human checkpoint between data ingestion and action execution.
Principle of Least Privilege
Only grant agents the minimum permissions they need. This includes limiting both the tools they can use and who can communicate with them.
Tool Configuration
Configure agents with only the tools necessary for their role:
Access Control with Allowlists
Mutiro agents include an allowlist feature that controls who can send messages to your agent. This is your first line of defense against unauthorized access.
Configure in .mutiro-agent.yaml
:
Supported Patterns:
-
Exact match: alice
matches only "alice" -
Wildcard `*`: team_*
matches "team_dev", "team_ops", etc. -
Single char `?`: user_?
matches "user_1", "user_a", but not "user_12" -
Negation `!`: !spam_*
blocks any username starting with "spam_"
BEST: Specific users only
GOOD: Team access with exceptions
DEFAULT: Owner-only (most secure)
CAUTION: Public access
- • Prevents unauthorized users from sending messages to your agent
- • Blocks potential prompt injection from untrusted sources
- • Failed access attempts are logged for security monitoring
- • Default (no allowlist) = only agent owner can send messages
Understanding the Trust Spectrum
The strict separation between intake and action agents represents the most secure approach, but it's not always the most practical. In reality, there's a spectrum based on how much you trust your data sources:
High Trust
Internal APIs, your own DBs, trusted team
Some actions OK:
-
Send messages -
Update records -
Run queries
Medium Trust
Known contacts, verified sources, curated feeds
Limited actions:
-
Create tasks -
Save drafts -
Log events
Low Trust
Public web, general email, external APIs
Read-only + approval:
-
Analysis only -
Report to human
Zero Trust
Anonymous input, user submissions, social media
NO actions:
-
Analysis only -
Report to human -
No external actions
The less you trust the data source, the fewer action capabilities the agent should have. When in doubt, default to the strictest separation.
Run Agents in Sandboxed Environments
Even with proper separation, agents should run in isolated, resource-limited environments to contain potential damage from compromised or misbehaving agents.
Security Settings Explained:
--network=none
Completely isolate from network (use for analysis-only agents)
--network=bridge
Allow network access (use minimal permissions for research agents)
--security-opt=no-new-privileges
Prevent privilege escalation
--cap-drop=ALL
Remove all Linux capabilities (most restrictive)
--tmpfs /tmp:noexec,nosuid
Writable temp dir but no code execution
-v path:/workspace:ro
Read-only workspace prevents data modification
-v path:/workspace
Writable workspace only for trusted action agents
Best Practices Checklist
Agent Architecture
-
Never combine untrusted data ingestion with action capabilities in one agent -
Always create separate intake and action agents -
Use the principle of least privilege - minimum necessary permissions -
Document your agent architecture and data flow
Sandboxing & Isolation
-
Run agents in Docker containers or other sandboxed environments -
Restrict network access to only what's necessary -
Use read-only filesystem mounts where possible -
Apply security policies (no-new-privileges, seccomp)
Review & Monitoring
-
Review all agent recommendations before executing actions -
Regularly audit agent capabilities and remove unnecessary tools -
Monitor agent activity logs for suspicious patterns -
Use Mutiro's mobile app to review agent requests when away from desk
Team & Training
-
Train team members on prompt injection risks -
Establish approval processes for production agent deployments -
Create incident response plan for compromised agents
Conclusion
AI agents are powerful tools, but they must be architected with security in mind. The key principle is simple: separate data ingestion from action execution, with human review in between.
Important Caveat:
Following these guidelines makes you more secure, not perfectly safe. Security is about reducing risk, not eliminating it. These practices significantly lower your exposure to prompt injection attacks and unintended consequences, but no system is 100% secure. Stay vigilant, monitor your agents, and continuously reassess your security posture as threats evolve.
Additional Resources
Questions or Concerns?
If you discover a security issue with Mutiro itself, please report it to: security@mutiro.com
For questions about securing your agent deployments, visit our community forum or contact support@mutiro.com