Skip to main content
You deploy a multi-agent system on Friday. Monday morning your CFO sends a Slack: “Why did we spend $4,000 over the weekend?” One agent got stuck in a loop. Nobody stopped it. No audit trail. Nothing. This is happening right now at hundreds of companies. Nanny is the execution boundary that prevents it. Nanny is an open-source, deterministic execution boundary for autonomous agents and multi-agent systems. You tell it how far each agent is allowed to go — in steps, cost units, wall-clock time, and which tools it can touch — and the moment any limit is crossed, Nanny kills the process immediately and emits a structured log saying exactly what happened and why. No grace period. No soft warnings. No recovery logic. No negotiation. That boundary is deterministic, auditable, and structurally impossible for any agent to bypass.

What nanny guarantees

When you run an agent under nanny, these three things are true:
  • It will not take more steps than you allow.
  • It will not spend more than your cost budget.
  • It will not run longer than your timeout.
If any limit is breached, Nanny kills the process immediately — the agent cannot catch, delay, or prevent the stop. An ExecutionStopped event is emitted with the exact reason, and Nanny exits with a non-zero status code.

What nanny is not

Nanny is not intelligent. It does not understand what your agent is doing, why it is doing it, or whether the result is good. It does not suggest better limits, summarise results, adapt based on context, or retry on failure. It is a primitive — a hard boundary you configure once and trust completely. This is intentional. The value comes from the guarantee: if you set a limit, it holds.

Who it is for

Nanny is for developers and teams running agents in production — or preparing to. It is a good fit if you:
  • Are building multi-agent systems where different agents have different roles, tool access, and budget ceilings — and you need enforcement that fires per-role, not just globally
  • Are running autonomous agents that call external tools, browse the web, or write to APIs
  • Need hard guarantees that an agent cannot exceed a cost budget or run indefinitely
  • Want a structured audit trail of every tool call and stop reason for every execution
  • Are building with CrewAI, LangChain, or any Python or Rust agent framework
  • Want enforcement that is not tied to any agent framework — use CrewAI, LangGraph, or any Python or Rust framework without lock-in

The multi-agent scenario

A fintech team builds a system where a manager agent spawns 12 specialists: one checks regulations, one pulls market data, one drafts reports. They deploy on Friday. One agent gets stuck looping on a market data API call over the weekend. The team has no per-role kill switch and no audit trail of which agent made which call. With Nanny, each specialist has its own named limit set in nanny.toml:
[limits.analysis]
steps   = 60
cost    = 200     # tight — this agent makes expensive calls
timeout = 60000

[limits.reporter]
steps   = 20
cost    = 50      # loose — this agent just writes a file
timeout = 30000
The analysis agent activates [limits.analysis] when it runs. The reporter activates [limits.reporter]. Each has its own tool allowlist — the analysis agent cannot call write_report, the reporter cannot call compute_stats. The moment any agent exceeds its ceiling or reaches for the wrong tool, Nanny stops it. The event log shows exactly which agent, which tool, which limit, and when. Scope today: This works for any multi-agent framework that runs agents within a single process — CrewAI, LangGraph, AutoGen, plain Python. See examples/python/metrics_crew for the complete working example. Cross-process and cross-machine fleet enforcement is the v0.1.6 cloud layer.

The nanny ecosystem

Nanny is designed to meet you where you are and grow with you. Nanny CLI — The enforcement entry point. Governs any agent process in any language as its parent process supervisor. Install it once as a system tool and use nanny run from any project that has a nanny.toml with a [start] command configured.
nanny run                        # reads [start].cmd from nanny.toml
nanny run --limits=researcher    # activates a named limit set
Rust SDK — For Rust agents, go deeper. Annotate individual functions with #[nanny::tool], #[nanny::rule], and #[nanny::agent] to get per-function cost accounting, allowlist enforcement, and custom rules. See the Rust SDK guide. Python SDK — The same model as the Rust SDK, as Python decorators. @tool, @rule, @agent. Each agent in your fleet gets its own budget ceiling, tool allowlist, and custom rules. Works with LangChain, CrewAI, or any Python agent framework. See Python SDK. Nanny Cloud (coming soon) — Durable audit logs, team dashboards, org-level budget aggregation, and managed enforcement across all your agents. The OSS runtime stays unchanged — Cloud is the observability and coordination layer above it.

Open source

The Nanny runtime is fully open source under the Apache 2.0 licence. Source code, issues, and contributions live at github.com/nanny-run/nanny. Cloud is the managed layer above the OSS primitive — not a replacement for it.

Next steps

Quickstart

Install nanny and run your first governed agent in under five minutes.

How it works

Understand the enforcement model and passthrough mode.

nanny.toml reference

Full schema for the configuration file.

Rust SDK guide

Per-function governance with #[nanny::tool], #[nanny::rule], #[nanny::agent].

Python SDK guide

Per-function governance with @tool, @rule, @agent decorators. Works with LangChain, CrewAI, and any Python agent framework.