Safety policy for constraining meta-agent modifications

facebookresearch/HyperAgents

Status: Open

Opened: Mar 28, 2026

Comments: 15

HyperAgents executes model-generated code in a self-improvement loop where the meta-agent rewrites task agent source autonomously. The README correctly flags this as executing "untrusted, model-generated code." We've put together a safety policy pack that constrains what the meta-agent can do during the optimization loop: - **Reads**: unrestricted (meta-agent needs to observe task agent performance) - **Writes**: restricted to `workspace/` only, with approval gate (prevents rewriting evaluation harness, own source, or system files) - **Command execution**: blocked (meta-agent rewrites code; execution goes through the framework) - **File deletion**: blocked (preserves full optimization history) - **Network requests**: blocked (closed-loop optimization, no data exfiltration) - **Rate limit**: 10 tool calls/minute (prevents runaway rewrite cycles) Every allowed and denied action produces a signed receipt. The full run produces a verifiable audit chain — useful for debugging optimization regressions and for reproducibility. The policies are available in both JSON and [Cedar](https://www.cedarpolicy.com/) format (compatible with AWS Verified Permissions): - JSON: [`hyperagent-sandbox.json`](https://github.com/tomjwxf/ScopeBlindD2/tree/main/examples/hyperagents/hyperagent-sandbox.json) - Cedar: [`hyperagent-sandbox.cedar`](https://github.com/tomjwxf/ScopeBlindD2/tree/main/examples/hyperagents/hyperagent-sandbox.cedar) Usage: ```bash npx protect-mcp --policy hyperagent-sandbo...

Python

View on GitHub ↗

Other Comments / Reviews

Perfect — the DecisionLog events already having `tool_nam...

by 0xbrainkid Mar 31, 2026
@0xbrainkid — the integration diagram is clean. Receipt s...

by tomjwxf Mar 31, 2026
The receipt chain approach is cleaner than hooks inside t...

by 0xbrainkid Mar 31, 2026
Good observation on cumulative drift. Static per-action p...

by tomjwxf Mar 31, 2026
The safety policy pack addresses the right constraints — ...

by 0xbrainkid Mar 31, 2026