Safety policy for constraining meta-agent modifications
facebookresearch/HyperAgents
HyperAgents executes model-generated code in a self-improvement loop where the meta-agent rewrites task agent source autonomously. The README correctly flags this as executing "untrusted, model-generated code."
We've put together a safety policy pack that constrains what the meta-agent can do during the optimization loop:
- **Reads**: unrestricted (meta-agent needs to observe task agent performance)
- **Writes**: restricted to `workspace/` only, with approval gate (prevents rewriting evaluation harness, own source, or system files)
- **Command execution**: blocked (meta-agent rewrites code; execution goes through the framework)
- **File deletion**: blocked (preserves full optimization history)
- **Network requests**: blocked (closed-loop optimization, no data exfiltration)
- **Rate limit**: 10 tool calls/minute (prevents runaway rewrite cycles)
Every allowed and denied action produces a signed receipt. The full run produces a verifiable audit chain — useful for debugging optimization regressions and for reproducibility.
The policies are available in both JSON and [Cedar](https://www.cedarpolicy.com/) format (compatible with AWS Verified Permissions):
- JSON: [`hyperagent-sandbox.json`](https://github.com/tomjwxf/ScopeBlindD2/tree/main/examples/hyperagents/hyperagent-sandbox.json)
- Cedar: [`hyperagent-sandbox.cedar`](https://github.com/tomjwxf/ScopeBlindD2/tree/main/examples/hyperagents/hyperagent-sandbox.cedar)
Usage:
```bash
npx protect-mcp --policy hyperagent-sandbo...
View on GitHub ↗
Other Comments / Reviews
SaaS Metrics