karpathy/autoresearch
AI agents running research on single-GPU nanochat training automatically
View Origin LinkProduct Positioning & Context
AI Executive Synthesis
AI agents running research *automatically* to discover new architectures. The question challenges the guarantee of novelty.
This issue directly questions the core value proposition of 'autoresearch': how adding agents *guarantees* novel architectures. It highlights a fundamental developer concern regarding the actual efficacy and innovation output of multi-agent systems. The pain point is the lack of clear, demonstrable mechanisms linking agent deployment to guaranteed novel outcomes, rather than mere optimization or iteration. Market implications include the need for AI agent platforms to articulate a stronger, evidence-based narrative around their capacity for true innovation and discovery, beyond efficiency gains. This suggests a demand for more sophisticated agent design that explicitly targets and measures architectural novelty.
AI agents running research on single-GPU nanochat training automatically
Active Developer Issues (GitHub)
Logged: Mar 8, 2026
Logged: Mar 8, 2026
Community Voice & Feedback
you just added a readme, maybe @karpathy can chime in
I was able to get codex to get codex to loop where it has `agent_loop.sh` for the while loop, `monitor_loop.sh` to monitor the agent, and `watchdog_loop.sh` to restart the agent loop.
I think you can achieve a model agnostic version of what you're looking for by using Pi pi.dev (https://github.com/badlogic/pi-mono/) and combining it with the Interactive Shell extension: https://github.com/nicobailon/pi-interactive-shell
can handle long running looping behavior with the ability for both human and agent to monitor and interrupt/interact.
That way you can use one agent harness framework but change the models, have the models compete or collaborate or review etc . . .
codex/claude subs work in pi oauth
can handle long running looping behavior with the ability for both human and agent to monitor and interrupt/interact.
That way you can use one agent harness framework but change the models, have the models compete or collaborate or review etc . . .
codex/claude subs work in pi oauth
I ran into the same issue while using codex. It seems to be related to the OpenAI API (or the model itself). I tried integrating GPT-5.4 into Claude Code, but it still wouldn't work continuously..
I'm having exactly this issue, with Codex using GPT 5.4.
I ended up having to run it in a `while` loop
```bash
while true; do
codex exec --dangerously-bypass-approvals-and-sandbox "have a look at program.md and kick off a new experiment loop" 2>&1 | tee -a agent.log
sleep 1
done
```
then I can search for "have a look at program.md" in agent.log to see it getting restarted.
But then I lose the interactivity of Codex.
I ended up having to run it in a `while` loop
```bash
while true; do
codex exec --dangerously-bypass-approvals-and-sandbox "have a look at program.md and kick off a new experiment loop" 2>&1 | tee -a agent.log
sleep 1
done
```
then I can search for "have a look at program.md" in agent.log to see it getting restarted.
But then I lose the interactivity of Codex.
experiencing this with 5.4?
https://github.com/karpathy/autoresearch/pull/70 we can also do these manually like the novelty verification part you're referring too/ Seems to be an infinite loop.
Currently I can only talk to the experiments I made in the fork (https://github.com/mkemka/autoresearch/blob/master/spiritualguidance.md). There are two competing agents that argue and generate a combined directive that is used to alter the program.md for the next run. The history is stored in the spiritual guidance.md and used as a working memory. So to actually measure the utility I would need to see if there is actually novelty or variance of ideas from this approach and if in the long term the loss is lower compared to a single agent.
so how do you measure the utility of novelty?
One approach I am experimenting with is to have two sub-agents with different backgrounds debate the best strategy to adopt. This doesn't guarantee a new architecture but adds novelty.
Related Early-Stage Discoveries
Discovery Source
GitHub Open Source Aggregated via automated community intelligence tracking.
Tech Stack Dependencies
No direct open-source NPM package mentions detected in the product documentation.
Media Tractions & Mentions
Deep Research & Science
No direct peer-reviewed scientific literature matched with this product's architecture.
Market Trends