GitHub Issue
[Feature]: Locked Gherkin DSL -- Bridging Shannon-Kolmogorov Gap for ~~Proven~~ Demonstrated Accuracy Gains
### Summary
AI-Drafted, several HITL iterations, then edited:
Add AI-assisted generation of locked Gherkin (`.feature`) files as a low-Kolmogorov-complexity DSL layer in GSD-2 — this single change turns GSD-2 from “statistically good” toward “algorithmically reliable” code generation.
### Problem to solve
GSD-2’s current spec-driven workflow (natural-language specs → code) inherits the statistical-next-token-prediction limitations analyzed in [Dalal & Misra(arXiv:2402.03175)](https://arxiv.org/pdf/2402.03175).
LLMs optimize *Shannon entropy* (output statistics) extremely well, but struggle with *Kolmogorov complexity* (minimal programmatic descriptions) — the exact tension Vishal Misra highlights in his recent writing ([“Shannon Got AI This Far. Kolmogorov Shows Where It Stops”[https://medium.com/@vishalmisra/shannon-got-ai-this-far-kolmogorov-shows-where-it-stops-c81825f89ca0]).
Without a low-complexity formal structure, in-context learning remains noisy and hallucination-prone.
### Proposed solution
Add native support for **Gherkin** (`.feature` files using Given/When/Then and additional constraints) as a first-class DSL inside GSD-2’s spec pipeline:
1. `/gsd testify` (or equivalent) generates [best-practice Gherkin](https://cucumber.io/docs/bdd/better-gherkin) `.feature` files from the current high-level natural language spec.md (AI-assisted, exactly as the paper demonstrates LLMs can learn a custom DSL in-context).
2. **Cryptographic locking** (SHA-256 hash ...
View Raw Thread
Developer & User Discourse
igouss • Mar 26, 2026
I think is not a bad idea.
> BDD (Behavior-Driven Development) is a software development approach where you define how the system should behave from the user’s perspective before writing the actual code.
It's kind of a natural fit to describe what needs to be done to AI.
> BDD (Behavior-Driven Development) is a software development approach where you define how the system should behave from the user’s perspective before writing the actual code.
It's kind of a natural fit to describe what needs to be done to AI.
0mm-mark • Mar 26, 2026
> It's kind of a natural fit to describe what needs to be done to AI.
Agree. And instinctively i've been interacting with AI using Gherkin habits.... But it was nice to see a formal demonstration and explanation (proof is too strong a term) for what the magnitude of the effect is.
Agree. And instinctively i've been interacting with AI using Gherkin habits.... But it was nice to see a formal demonstration and explanation (proof is too strong a term) for what the magnitude of the effect is.
jeremymcs • Mar 26, 2026
The main issue is VISION.md alignment. The project is extension-first: if it can be an extension, it should be. Nothing here requires core integration. GSD-2 already has an extension registration system, custom workflow definitions with pluggable verification policies, and a step-based engine that handles sequencing and artifact production. Gherkin generation, hash locking, and BDD enforcement all fit on top of that without touching core.
As proposed, this would cut across state management, the verification gate, auto-mode, preferences, and the planning pipeline — deep core changes for an opt-in workflow preference. That bumps into "complexity without user value" territory per VISION.md, especially with config flags like `shannon_kolmogorov_bias` that require reading a paper to understand.
The path forward would be to build this as an extension. Prove the value there across different providers and project types. If it demonstrates clear improvement, then there's a conversation about ...
As proposed, this would cut across state management, the verification gate, auto-mode, preferences, and the planning pipeline — deep core changes for an opt-in workflow preference. That bumps into "complexity without user value" territory per VISION.md, especially with config flags like `shannon_kolmogorov_bias` that require reading a paper to understand.
The path forward would be to build this as an extension. Prove the value there across different providers and project types. If it demonstrates clear improvement, then there's a conversation about ...
0mm-mark • Mar 26, 2026
@jeremymcs thanks for the guidance around next steps.
This sounds like a blocker in the shadows:
> If it demonstrates clear improvement...
I think it's useful to first establish what that criteria would be, specifically where the paper falls short. Then that evidence can be gathered.
> ... especially with config flags like shannon_kolmogorov_bias that require reading a paper to understand.
I think docs would be sufficient. `feature_weight`: `none, partial, full` would be equivalent.
This sounds like a blocker in the shadows:
> If it demonstrates clear improvement...
I think it's useful to first establish what that criteria would be, specifically where the paper falls short. Then that evidence can be gathered.
> ... especially with config flags like shannon_kolmogorov_bias that require reading a paper to understand.
I think docs would be sufficient. `feature_weight`: `none, partial, full` would be equivalent.
Market Trends
This was automatically flagged for maintainer review.
**Flag:** Complexity without user value
This proposal introduces significant architectural complexity (cryptographic locking, new DSL layer, configuration flags, validation gates) based primarily on theoretical arguments from a machine learning paper rather than demonstrated user problems in GSD-2. The issue conflates LLM reasoning theory with practical GSD-2 workflows without evidence that current spec-driven generation is failing in ways users experience. Per VISION.md, complexity requires user-visible improvement—this reads as over-engineered infrastructure for a hypothetical problem.
Please review our [VISION.md](https://github.com/gsd-build/GSD-2/blob/main/VISION.md) and [CONTRIBUTING.md](https://github.com/gsd-build/GSD-2/blob/main/CONTRIBUTING.md) for project guidelines.
A maintainer will review this shortly. If you believe this was flagged in error, no action is needed — we'll take a loo...