Safety and control mechanisms for self-improving AI agents (HyperAgents), specifically constraining meta-agent modifications and detecting behavioral drift.
Raw Developer Origin & Technical Request
GitHub Issue
Mar 28, 2026
HyperAgents executes model-generated code in a self-improvement loop where the meta-agent rewrites task agent source autonomously. The README correctly flags this as executing "untrusted, model-generated code."
We've put together a safety policy pack that constrains what the meta-agent can do during the optimization loop:
- **Reads**: unrestricted (meta-agent needs to observe task agent performance)
- **Writes**: restricted to `workspace/` only, with approval gate (prevents rewriting evaluation harness, own source, or system files)
- **Command execution**: blocked (meta-agent rewrites code; execution goes through the framework)
- **File deletion**: blocked (preserves full optimization history)
- **Network requests**: blocked (closed-loop optimization, no data exfiltration)
- **Rate limit**: 10 tool calls/minute (prevents runaway rewrite cycles)
Every allowed and denied action produces a signed receipt. The full run produces a verifiable audit chain — useful for debugging optimization regressions and for reproducibility.
The policies are available in both JSON and [Cedar](cedarpolicy.com format (compatible with AWS Verified Permissions):
- JSON: [`hyperagent-sandbox.json`](github.com/tomjwxf/ScopeBlin...
- Cedar: [`hyperagent-sandbox.cedar`](github.com/tomjwxf/ScopeBlin...
Usage:
```bash
npx protect-mcp --policy hyperagent-sandbo...
Developer Debate & Comments
Adjacent Repository Pain Points
Other highly discussed features and pain points extracted from facebookresearch/HyperAgents.
Engagement Signals
Cross-Market Term Frequency
Quantifies the cross-market adoption of foundational terms like JSON and Self-referential self-improving agents by tracking occurrence frequency across active SaaS architectures and enterprise developer debates.
Market Trends