← Back to AI Insights
Gemini Executive Synthesis

Codex's inability to sustain continuous, non-stopping operations for autoresearch tasks, contrasting with Claude's behavior. The core issue is maintaining interactive, long-running agent sessions.

Technical Positioning
AI agents running research *automatically* and continuously. The issue highlights a failure to achieve this continuous operation with Codex.
SaaS Insight & Market Implications
Codex is failing to execute continuous, non-stopping operations essential for 'autoresearch,' unlike Claude. This forces developers into cumbersome workarounds like external `while` loops, sacrificing critical interactive session capabilities. The pain point is the lack of native, robust looping mechanisms and the inability to maintain visibility and intervention in long-running agent tasks. Market implications include a significant barrier to deploying autonomous research agents with Codex, pushing users towards alternative models or complex external orchestration. The demand for model-agnostic, interactive, and persistent agent execution frameworks is evident, highlighting a critical gap in current AI agent tooling for complex, multi-step workflows.
Proprietary Technical Taxonomy
Codex autoresearch Claude ignores instruction to never stop /loop interactive sessions ralph loop GPT 5.4

Raw Developer Origin & Technical Request

Source Icon GitHub Issue Mar 8, 2026
Repo: karpathy/autoresearch
Codex doesn't seem to work?

Codex doesn't work with autoresearch as far as I can tell (unlike Claude) because it ignores instruction to never stop. I'm not sure if there is a way to "kick it" that someone has found. In Claude that would be the new /loop (except as I mentioned it's not needed). I know you could have a ralph loop but those are not interactive sessions. I really much prefer an interactive session because you can see the work the agent is doing and also pitch in arbitrarily.

Developer Debate & Comments

SlipstreamAI • Mar 9, 2026
experiencing this with 5.4?
rankun203 • Mar 9, 2026
I'm having exactly this issue, with Codex using GPT 5.4. I ended up having to run it in a `while` loop ```bash while true; do codex exec --dangerously-bypass-approvals-and-sandbox "have a look at program.md and kick off a new experiment loop" 2>&1 | tee -a agent.log sleep 1 done ``` then I can search for "have a look at program.md" in agent.log to see it getting restarted. But then I lose the interactivity of Codex.
sen-ye • Mar 9, 2026
I ran into the same issue while using codex. It seems to be related to the OpenAI API (or the model itself). I tried integrating GPT-5.4 into Claude Code, but it still wouldn't work continuously..
Whamp • Mar 9, 2026
I think you can achieve a model agnostic version of what you're looking for by using Pi pi.dev (https://github.com/badlogic/pi-mono/) and combining it with the Interactive Shell extension: https://github.com/nicobailon/pi-interactive-shell can handle long running looping behavior with the ability for both human and agent to monitor and interrupt/interact. That way you can use one agent harness framework but change the models, have the models compete or collaborate or review etc . . . codex/claude subs work in pi oauth
jonathanpwang • Mar 9, 2026
I was able to get codex to get codex to loop where it has `agent_loop.sh` for the while loop, `monitor_loop.sh` to monitor the agent, and `watchdog_loop.sh` to restart the agent loop.

Adjacent Repository Pain Points

Other highly discussed features and pain points extracted from karpathy/autoresearch.

Extracted Positioning
The mechanism by which adding agents contributes to generating novel architectures in autoresearch.
AI agents running research *automatically* to discover new architectures. The question challenges the guarantee of novelty.
Top Replies
mkemka • Mar 9, 2026
One approach I am experimenting with is to have two sub-agents with different backgrounds debate the best strategy to adopt. This doesn't guarantee a new architecture but adds novelty.
ngoiyaeric • Mar 9, 2026
so how do you measure the utility of novelty?
mkemka • Mar 9, 2026
Currently I can only talk to the experiments I made in the fork (https://github.com/mkemka/autoresearch/blob/master/spiritualguidance.md). There are two competing agents that argue and generate a c...

Engagement Signals

19
Replies
open
Issue Status

Cross-Market Term Frequency

Quantifies the cross-market adoption of foundational terms like Claude and Codex by tracking occurrence frequency across active SaaS architectures and enterprise developer debates.

Macro Market Trends

Correlated public search velocity for adjacent technologies.

Autoresearch Claude Claude-agent-sdk