← Back to AI Insights
Gemini Executive Synthesis

Statewright – A Rust engine that evaluates visual state machine definitions to make AI agents reliable, integrating with LLMs via a plugin layer and offering a visual editor for workflow management.

Technical Positioning
Visual state machines that make AI agents reliable. It positions itself as an alternative to brute-forcing reliability with larger models, focusing on constraining tool and solution spaces for improved context utilization.
SaaS Insight & Market Implications
The agentic AI market faces significant reliability challenges, often addressed inefficiently through larger models and extended prompts. Statewright directly confronts this by introducing a protocol-enforced state machine approach, constraining LLM behavior and tool access. This shifts the focus from raw model size to optimized context utilization, demonstrating improved reliability and reduced token usage even with smaller models. The visual editor and explicit failure path visualization address a critical developer pain point: debugging and managing complex, non-deterministic agent workflows. This product targets enterprises struggling with brittle AI deployments, offering a structured, auditable method for agent orchestration, a clear market differentiator in a nascent but rapidly maturing space.
Proprietary Technical Taxonomy
Agentic problem solving brittle massive parameter counts massive context windows brute forcing reliability smaller models 13-20B parameter range SWE-bench problems

Raw Developer Origin & Technical Request

Source Icon Hacker News May 13, 2026
Show HN: Statewright – Visual state machines that make AI agents reliable

Agentic problem solving in its current state is very brittle. I fell in love with it, but it creates as many problems as it solves.I'm Ben Cochran, I spent 20+ years in the trenches with full-stack Engineering, DevOps, high performance computing & ML with stints at NVIDIA, AMD and various other organizations most recently as a Distinguished Engineer.For agents to work reliably you either need massive parameter counts or massive context windows to keep the solution spaces workable. Most people are brute forcing reliability with bigger models and longer prompts.What if I made the problem smaller instead of making the model bigger?I took a different approach by using smaller models: models in the 13-20B parameter range and set them to task solving real SWE-bench problems. I constrained the tool and solution spaces using formal state machines. Each state in the machine defines which tools the model can access, how many iterations it gets and what transitions are valid. A planning state gets read-only tools. An implementation state gets edit tools (scoped to prevent mega edits) and write friendly bash tools. The testing state gets bash but only for testing commands. The model cannot physically skip steps or use the wrong tool at the wrong time. It is enforced via protocol, not via prompts.The results were more promising than I would have expected. Across multiple model families irrespective of age (qwen-coder, gpt-oss, gemma4) and the improvements were consistent above the 13B parameter inflection point. Below that, models can navigate the state machine but can't retain enough context to produce accurate edits. More on the research bit: statewright.ai/researchSurprisin... this yielded improvements in frontier models as well. Haiku and Sonnet start to punch above their weight and Opus solves more reliably with fewer tokens and death spirals. Fine tuning did not yield these kinds of functional improvements for me. The takeaway it seems is that context window utilization matters more than raw context size - a tightly scoped working context at each step outperforms a model given carte blanche over everything. Constraining LLMs which are non-idempotent by using deterministic code is a pattern that nobody is currently talking about.So, I built Statewright. Its core is a Rust engine that evaluates state machine definitions: states, transitions, guards and tool restrictions. Its orchestration doesn't use an LLM, just enforces the state machine. On top of that is a plugin layer that integrates with Claude Code (and soon Codex, Cursor and others) via MCP. When you activate a workflow, hooks enforce the guardrails per state automatically. The model sees 5 tools available instead of dozens, gets clear instructions for the current phase and transitions when conditions are met. Importantly it tells the model when it's attempting to do something that isn't in scope, incorrect or when it needs to try something else after getting stuck.You can use your agent via MCP to build a state machine for you to solve a problem in your current context. The visual editor at statewright.ai lets you tweak these workflows in a graph view... You can clearly see the failure paths, the retry loops and the approval gates. State machines aren't DAGs; they loop and retry, which is what agentic work actually needs.Statewright is currently live with a free tier, try it out in Claude Code by running the following:/plugin marketplace add statewright/statewright/plugin install statewright/reload-pluginsThen "start the bugfix workflow" or /statewright start bugfix. You'll need to paste your API key when prompted. The latest versions of Claude may complain -- paste the API key again and say you really mean it, Claude is just being cautious here.Feedback is welcome on the workflow editor, the plugin experience, and tell me what workflows you'd want to build first. Agents are suggestions, states are laws.

Developer Debate & Comments

miki_tyler • May 13, 2026
Very nice project!Is the editor/composer separate from the runtime?If I build a workflow in the visual editor, can I use that same flow inside my own app just by using the runtime/engine? Or is it mainly tied to the Statewright platform and Claude Code plugin?I’m wondering if the runtime can be used as a standalone piece to power apps I build.
brainless • May 13, 2026
I have to check how you are using state machines but I have also been focused on small models for a while now.nocodo is one of my product experiments, currently using 120B model but I have tested a few agents inside it with 20B models.I create a bunch of agents, each with very specific goals. Like Project Manager, Backend Engineer, etc.Each agent gets a very compact list of tools and access to only certain parts of the filesystem or commands.https://github.com/brainless/nocodo/tree/main/agents/src
redhale • May 12, 2026
I feel like caching should be mentioned in tradeoffs, right? If you change the tool list frequently, that's a cache bust. In long sessions that seems like it could significantly affect costs.
2001zhaozhao • May 12, 2026
Interesting.In your Github, the JSON format shown for defining custom workflows is very simple. I wonder if that limits the detail in the state-related instructions and error messages you can send to a model.For example, in state transitions, does your tool just tell the model something like "you are in 'act' mode and no longer in 'plan' mode, here are your new available tools"? Seems difficult to give it any more informative messages given how simple the workflow definitions are. Likewise when the model attempts to do something that's not supported for tools in the given phase.
chris_st • May 12, 2026
Please add support for the Windsurf editor as well. Thanks!
embedding-shape • May 12, 2026
I wanted to try to reproduce the research results (https://github.com/statewright/statewright#research-results) locally but I wasn't able to find the code for it, have you publish the code for running those somewhere?The research page (https://statewright.ai/research) mentions a patent, and a "core engine";> Provisional patent application filed: /054,240 (April 30, 2026). 35 claims covering state machine guardrail enforcement for LLM agent tool access. The core engine remains Apache 2.0 open source.I'm not sure I understand what the "core engine" is if it's not the "state machine guardrail runtime" which is what the patent cover. What parts are the open source parts exactly?I find the idea really interesting and was nodding along the way as I read what you wrote, makes sense both for the human and the agent, seems like a really nice idea that'd help, but the patent kind of makes me want to run away and not look into it too deeply.
esafak • May 12, 2026
I just have a smart model write a testable phased plan, have a cheaper model implement them, and yet another model to review each phase. I don't see the value of adding a Rust state engine. Algorithmically verifiable things can be tests, and more nebulous things (like pattern compliance) need an LLM to do the heavy lifting and can make mistakes, so what does the state engine buy you?
davidkpiano • May 12, 2026
Pretty cool. Looks like stately.ai but catered towards agentic state machine workflows. Really interesting!
password4321 • May 12, 2026
Does it make sense to ship an MCP code mode API? I'm surprised you're recommending MCP as-is when concerned about context usage optimization. I don't have a lot of hands-on experience either way yet so I'm curious what's best and/or most popular... I understand MCP is less effort and still affordable at VC-subsidised prices.
giancarlostoro • May 12, 2026
Interesting, I built a ticketing system similar to Beads which has yielded more predictable results with Claude and other models, and I'm currently building a custom harness, I'm able to use offline models though my GPU ram bandwidth is much lower, but I'm also planning on doing something similar to what you've built, namely the editing tools and what not, I hate how long it takes for Claude to look for files, it feels wasteful. I'm still astounded that everyone else has figured out ways to speed up harnesses, but Claude Code is still slow like a slug. I don't even care if I am waiting on the LLM in terms of slowness, but running local tools slowly bothers the living crap out of me, stop using grep, RIPGREP IS FASTER!In any case, I'll have to check out Statewright after work ;)

Frequently Asked Questions

Market intelligence mapped to Statewright – A Rust engine that evaluates visual state machine definitions to make AI agents reliable, integrating with LLMs via a plugin layer and offering a visual editor for workflow management..

What problem does Statewright – A Rust engine that evaluates visual state machine definitions to make AI agents reliable, integrating with LLMs via a plugin layer and offering a visual editor for workflow management. solve?
Based on our AI analysis of the original developer request, its primary technical positioning is: Visual state machines that make AI agents reliable. It positions itself as an alternative to brute-forcing reliability with larger models, focusing on constraining tool and solution spaces for improved context utilization.
How is the developer community reacting to Statewright – A Rust engine that evaluates visual state machine definitions to make AI agents reliable, integrating with LLMs via a plugin layer and offering a visual editor for workflow management.?
Yes, we have tracked 27 direct responses and active debates regarding this specific topic originating from Hacker News.
Which technical concepts are associated with Statewright – A Rust engine that evaluates visual state machine definitions to make AI agents reliable, integrating with LLMs via a plugin layer and offering a visual editor for workflow management.?
Our proprietary extraction maps Statewright – A Rust engine that evaluates visual state machine definitions to make AI agents reliable, integrating with LLMs via a plugin layer and offering a visual editor for workflow management. to adjacent architectural concepts including Agentic problem solving, brittle, massive parameter counts, massive context windows.

Engagement Signals

83
Upvotes
27
Comments

Cross-Market Term Frequency

Quantifies the cross-market adoption of foundational terms like Claude Code and MCP by tracking occurrence frequency across active SaaS architectures and enterprise developer debates.