gsd-build/gsd-2
A powerful meta-prompting, context engineering and spec-driven development system that enables agents to work for long periods of time autonomously without losing track of the big picture
View Origin LinkProduct Positioning & Context
AI Executive Synthesis
GSD-2 as the execution backend for the broader AI agent ecosystem (OpenClaw, MCP, CI/CD)
This Architectural Decision Record (ADR) outlines a critical strategic move for GSD-2: solidifying its headless mode and JSON-RPC protocol as the primary programmable surface for AI agent ecosystems. Positioning GSD-2 as an execution backend for platforms like OpenClaw and CI/CD pipelines is a direct response to market demand for robust, interoperable AI tools. The proposal addresses current limitations like weak completion detection and lack of versioning, which are anti-patterns in automation. While the strategic direction is sound, the internal audit reveals factual inaccuracies regarding existing MCP server capabilities, indicating a need for precise baseline assessment before locking in architectural changes.
A powerful meta-prompting, context engineering and spec-driven development system that enables agents to work for long periods of time autonomously without losing track of the big picture
Active Developer Issues (GitHub)
Logged: Mar 28, 2026
Logged: Mar 27, 2026
Logged: Mar 26, 2026
Logged: Mar 25, 2026
Community Voice & Feedback
## VISION.md Alignment Review
**Supports:**
- **Extension-first**: Making GSD an execution backend for OpenClaw, MCP, and CI/CD is exactly the kind of ecosystem integration that the extension model enables
- **Ship fast**: The headless/RPC layer already exists with three internal consumers -- hardening it for external use is incremental, not greenfield
**Concerns:**
- The scope is broad (OpenClaw, MCP server, CI/CD, SDKs). Each integration should be independently shippable.
- The MCP server integration has a timing challenge (60s timeout vs 5-30min operations) that needs careful design for the start/poll/complete pattern.
**Recommendation:** Accept. Prioritize the MCP server exposure first (highest ecosystem demand), then OpenClaw/CI/CD integrations. Each should ship independently.
**Supports:**
- **Extension-first**: Making GSD an execution backend for OpenClaw, MCP, and CI/CD is exactly the kind of ecosystem integration that the extension model enables
- **Ship fast**: The headless/RPC layer already exists with three internal consumers -- hardening it for external use is incremental, not greenfield
**Concerns:**
- The scope is broad (OpenClaw, MCP server, CI/CD, SDKs). Each integration should be independently shippable.
- The MCP server integration has a timing challenge (60s timeout vs 5-30min operations) that needs careful design for the start/poll/complete pattern.
**Recommendation:** Accept. Prioritize the MCP server exposure first (highest ecosystem demand), then OpenClaw/CI/CD integrations. Each should ship independently.
Keeping is small and modular will also help AI reasoning when developing this app.
I think we should also consider packaging common extension such as [Language] developer, tester, etc, out of the core into reusable skill/extension packs.
I think we should also consider packaging common extension such as [Language] developer, tester, etc, out of the core into reusable skill/extension packs.
### ADR-006 & Implementation Plan Review
I have completed a comprehensive review of the ADR and the [Implementation Plan](https://github.com/jeremymcs/gsd-2/blob/feat/extension-system-analysis/.plans/IMPLEMENTATION-PLAN-extension-modularization.md) against the current codebase. My analysis confirms that the **177,005 LOC `gsd` monolith** and the **842 MB installation size** are critical bottlenecks that this plan accurately addresses.
#### **Verification Results**
- **Monolith Bloat:** Confirmed `src/resources/extensions/gsd` contains **177,005 lines of code**, verifying the urgent need for Phase 5 (Core Decomposition).
- **Architectural Coupling:** Identified **15+ direct imports** from `../gsd/` in peripheral extensions (e.g., `github-sync`, `subagent`), confirming that "reverse dependencies" are a primary blocker for modularity.
- **Dependency Analysis:** The root `package.json` is heavily burdened by `playwright` and multiple AI SDKs. Phase 6 (AI Provider Lazy Loading) and Phase ...
I have completed a comprehensive review of the ADR and the [Implementation Plan](https://github.com/jeremymcs/gsd-2/blob/feat/extension-system-analysis/.plans/IMPLEMENTATION-PLAN-extension-modularization.md) against the current codebase. My analysis confirms that the **177,005 LOC `gsd` monolith** and the **842 MB installation size** are critical bottlenecks that this plan accurately addresses.
#### **Verification Results**
- **Monolith Bloat:** Confirmed `src/resources/extensions/gsd` contains **177,005 lines of code**, verifying the urgent need for Phase 5 (Core Decomposition).
- **Architectural Coupling:** Identified **15+ direct imports** from `../gsd/` in peripheral extensions (e.g., `github-sync`, `subagent`), confirming that "reverse dependencies" are a primary blocker for modularity.
- **Dependency Analysis:** The root `package.json` is heavily burdened by `playwright` and multiple AI SDKs. Phase 6 (AI Provider Lazy Loading) and Phase ...
## Implementation Plan: Extension Modularization
**Full plan:** [`.plans/IMPLEMENTATION-PLAN-extension-modularization.md`](https://github.com/jeremymcs/gsd-2/blob/feat/extension-system-analysis/.plans/IMPLEMENTATION-PLAN-extension-modularization.md)
### Summary
**Goal:** Make GSD2 lightweight out of the box by extracting optional features into installable extensions, reducing core footprint from 177K LOC to ~15-20K LOC and `node_modules` from 842 MB to ~450 MB.
### Phases
| Phase | Description | Effort | Risk | Dependencies |
|-------|-------------|--------|------|-------------|
| **0** | Foundation — Dependency Cleanup & Architecture Fixes | Small (1-2 sessions) | Low | None |
| **1** | Extension Install Infrastructure (`gsd extensions install`) | Medium (2-3 sessions) | Medium | None |
| **2** | Extract Self-Contained Extensions (Tier 1) — 8 extensions | Medium (2-3 sessions) | Low | Phase 1 |
| **3** | Preferences Service & Tier 2 Extraction — 6 extensions | Medium-Large (3-4 s...
**Full plan:** [`.plans/IMPLEMENTATION-PLAN-extension-modularization.md`](https://github.com/jeremymcs/gsd-2/blob/feat/extension-system-analysis/.plans/IMPLEMENTATION-PLAN-extension-modularization.md)
### Summary
**Goal:** Make GSD2 lightweight out of the box by extracting optional features into installable extensions, reducing core footprint from 177K LOC to ~15-20K LOC and `node_modules` from 842 MB to ~450 MB.
### Phases
| Phase | Description | Effort | Risk | Dependencies |
|-------|-------------|--------|------|-------------|
| **0** | Foundation — Dependency Cleanup & Architecture Fixes | Small (1-2 sessions) | Low | None |
| **1** | Extension Install Infrastructure (`gsd extensions install`) | Medium (2-3 sessions) | Medium | None |
| **2** | Extract Self-Contained Extensions (Tier 1) — 8 extensions | Medium (2-3 sessions) | Low | Phase 1 |
| **3** | Preferences Service & Tier 2 Extraction — 6 extensions | Medium-Large (3-4 s...
## Research Findings (2026-03-28)
4 parallel researchers completed — Stack, Features, Architecture, Pitfalls. Full synthesis in `.planning/research/SUMMARY.md`.
### Stack
- **One new dependency:** `semver ^7.6.3` — everything else uses existing deps or Node.js built-ins
- **npm subprocess pattern:** `spawnSync("npm", ["install", "--prefix", targetDir, "--ignore-scripts"])` — no programmatic npm API (none is stable)
- **EventBus already exists:** `packages/pi-coding-agent/src/core/event-bus.ts` exposed as `pi.events` — no new event library needed
- **Existing utilities:** `proper-lockfile` (concurrent registry mutation), `hosted-git-info` (git URL parsing), `undici`/native `fetch` (registry API)
### Architecture
- **7 confirmed coupling sites** between gsd ↔ cmux: 5 in gsd importing cmux, 2 in cmux importing gsd types
- **1 reverse dep:** `shared/rtk-session-stats.ts:5` → `gsd/paths.js`
- **Discovery is directory-agnostic:** `discoverExtensionEntryPaths()` already works for any dir...
4 parallel researchers completed — Stack, Features, Architecture, Pitfalls. Full synthesis in `.planning/research/SUMMARY.md`.
### Stack
- **One new dependency:** `semver ^7.6.3` — everything else uses existing deps or Node.js built-ins
- **npm subprocess pattern:** `spawnSync("npm", ["install", "--prefix", targetDir, "--ignore-scripts"])` — no programmatic npm API (none is stable)
- **EventBus already exists:** `packages/pi-coding-agent/src/core/event-bus.ts` exposed as `pi.events` — no new event library needed
- **Existing utilities:** `proper-lockfile` (concurrent registry mutation), `hosted-git-info` (git URL parsing), `undici`/native `fetch` (registry API)
### Architecture
- **7 confirmed coupling sites** between gsd ↔ cmux: 5 in gsd importing cmux, 2 in cmux importing gsd types
- **1 reverse dep:** `shared/rtk-session-stats.ts:5` → `gsd/paths.js`
- **Discovery is directory-agnostic:** `discoverExtensionEntryPaths()` already works for any dir...
## ADR-006 Review: Findings and Recommendations
As Grok, built by xAI, I've reviewed ADR-006: Extension Modularization & Install Infrastructure based on a deep exploration of the codebase and the ADR content. Here's my analysis and recommendations.
### Overall Assessment
- **Architectural Soundness**: Excellent. The ADR proposes a pragmatic, incremental refactor from a monolithic extension system to modular npm-based extensions, reducing core bloat and enabling runtime management. It aligns with GSD's evolutionary approach (e.g., v1.3 foundation).
- **Clarity and Structure**: High. Well-organized with a clear problem, milestones, and tables. Futuristic dates are placeholders.
- **Feasibility**: Strong. Leverages npm, Node.js, and existing Pi SDK. CLI commands and event decoupling are straightforward.
- **Alignment with Codebase**: Perfect. Builds on current loader/registry without breaking existing workflows. Event bus fixes known coupling issues.
- **Readiness**: Ready for v1.3 impl...
As Grok, built by xAI, I've reviewed ADR-006: Extension Modularization & Install Infrastructure based on a deep exploration of the codebase and the ADR content. Here's my analysis and recommendations.
### Overall Assessment
- **Architectural Soundness**: Excellent. The ADR proposes a pragmatic, incremental refactor from a monolithic extension system to modular npm-based extensions, reducing core bloat and enabling runtime management. It aligns with GSD's evolutionary approach (e.g., v1.3 foundation).
- **Clarity and Structure**: High. Well-organized with a clear problem, milestones, and tables. Futuristic dates are placeholders.
- **Feasibility**: Strong. Leverages npm, Node.js, and existing Pi SDK. CLI commands and event decoupling are straightforward.
- **Alignment with Codebase**: Perfect. Builds on current loader/registry without breaking existing workflows. Event bus fixes known coupling issues.
- **Readiness**: Ready for v1.3 impl...
Hi! I'd like to work on this. My understanding is that the current model routing system (ADR-004) scores models based on capabilities but assumes universal tool compatibility, which leads to silent failures and degraded context when switching providers. It lacks a declarative way to express provider constraints (like image support or schema limits) and tool requirements. I'm planning to fix it by implementing the Phase 1-3 roadmap described in the ADR: building the `ProviderCapabilities` registry in `pi-ai` to replace scattered provider-specific logic, adding `ToolCompatibility` metadata to `ToolDefinition`, and integrating the new hard-filtering step into `resolveModelForComplexity()` before the ADR-004 capability scoring. I will also build out the `ProviderSwitchReport` for cross-provider context tracking. This will be a substantial architectural lift across the core packages. Does this approach sound good?
## Commit: `8dc83440` — Close capability validation gaps across all dispatch paths
**Branch:** `feat/provider-capability-registry`
### What was done
1. **Expanded `getRequiredToolNames()`** — from 4 to 16 unit types with accurate tool requirements (plan-*, run-uat, replan-slice, complete-*, reactive-execute, rewrite-docs, reassess-roadmap, gate-evaluate)
2. **Fixed 3 unprotected dispatch paths** that bypassed all capability checks:
- `auto-direct-dispatch.ts` (`/gsd dispatch`) — now applies capability overrides + tool filtering
- `guided-flow.ts` (`dispatchWorkflow`) — now filters incompatible tools after model selection
- `auto.ts` (`dispatchHookUnit`) — now gets capability overrides + tool adjustment
3. **New `applyToolCompatibilityAdjustment()`** — reusable function used by all 3 dispatch paths
4. **Plan-time capability validation** — `validatePlanCapabilities()` checks task descriptions/files against model pool capabilities; `formatCapabilityConstraints()` injected i...
**Branch:** `feat/provider-capability-registry`
### What was done
1. **Expanded `getRequiredToolNames()`** — from 4 to 16 unit types with accurate tool requirements (plan-*, run-uat, replan-slice, complete-*, reactive-execute, rewrite-docs, reassess-roadmap, gate-evaluate)
2. **Fixed 3 unprotected dispatch paths** that bypassed all capability checks:
- `auto-direct-dispatch.ts` (`/gsd dispatch`) — now applies capability overrides + tool filtering
- `guided-flow.ts` (`dispatchWorkflow`) — now filters incompatible tools after model selection
- `auto.ts` (`dispatchHookUnit`) — now gets capability overrides + tool adjustment
3. **New `applyToolCompatibilityAdjustment()`** — reusable function used by all 3 dispatch paths
4. **Plan-time capability validation** — `validatePlanCapabilities()` checks task descriptions/files against model pool capabilities; `formatCapabilityConstraints()` injected i...
## ADR-005 Review: Findings and Recommendations (Revised)
As Grok, built by xAI, I've reviewed ADR-005: Multi-Model, Multi-Provider, and Tool Strategy based on a deep exploration of the codebase and the ADR content. Here's my analysis and recommendations, now incorporating additional changes from a deep dive into the issue comments.
### Overall Assessment
- **Architectural Soundness**: Excellent. The ADR builds logically on ADR-004's capability scoring, treating tool compatibility as a hard prerequisite filter rather than a soft score. This prevents routing to incompatible models while preserving the tiered, downgrade-safe pipeline.
- **Clarity and Structure**: High. The document is well-organized with clear principles, code examples, and a phased implementation. It explicitly addresses gaps in the current system (tool assumptions, provider failover degradation).
- **Feasibility**: Strong. Direct extensions to existing files (e.g., `model-router.ts` for filtering, `ToolDefinition` fo...
As Grok, built by xAI, I've reviewed ADR-005: Multi-Model, Multi-Provider, and Tool Strategy based on a deep exploration of the codebase and the ADR content. Here's my analysis and recommendations, now incorporating additional changes from a deep dive into the issue comments.
### Overall Assessment
- **Architectural Soundness**: Excellent. The ADR builds logically on ADR-004's capability scoring, treating tool compatibility as a hard prerequisite filter rather than a soft score. This prevents routing to incompatible models while preserving the tiered, downgrade-safe pipeline.
- **Clarity and Structure**: High. The document is well-organized with clear principles, code examples, and a phased implementation. It explicitly addresses gaps in the current system (tool assumptions, provider failover degradation).
- **Feasibility**: Strong. Direct extensions to existing files (e.g., `model-router.ts` for filtering, `ToolDefinition` fo...
### **Gemini ADR-005 Review: Multi-Model, Multi-Provider, and Tool Strategy**
I have reviewed the proposal and its alignment with the existing routing architecture (ADR-004). This is a necessary evolution that correctly treats technical compatibility as a prerequisite for capability scoring.
#### **Key Findings**
* **Architectural Robustness:** The 4-step routing pipeline (**Tier Eligibility → Technical Filtering → Capability Ranking → Tool Set Adjustment**) is sound. It prevents "capability-blind" routing where a model might be highly ranked for reasoning but technically incapable of using the required tools.
* **Data Continuity:** The `ProviderSwitchReport` is a critical addition. Tracking "context loss" (thinking blocks, signatures) when moving between heterogeneous providers (e.g., Anthropic to Google) is essential for long-term session health.
* **Maintainability:** Centralizing provider "quirks" (schema limits, tool ID formats) into a declarative registry is a major impro...
I have reviewed the proposal and its alignment with the existing routing architecture (ADR-004). This is a necessary evolution that correctly treats technical compatibility as a prerequisite for capability scoring.
#### **Key Findings**
* **Architectural Robustness:** The 4-step routing pipeline (**Tier Eligibility → Technical Filtering → Capability Ranking → Tool Set Adjustment**) is sound. It prevents "capability-blind" routing where a model might be highly ranked for reasoning but technically incapable of using the required tools.
* **Data Continuity:** The `ProviderSwitchReport` is a critical addition. Tracking "context loss" (thinking blocks, signatures) when moving between heterogeneous providers (e.g., Anthropic to Google) is essential for long-term session health.
* **Maintainability:** Centralizing provider "quirks" (schema limits, tool ID formats) into a declarative registry is a major impro...
Codex
[P1] `ProviderSwitchReport` cannot be consumed by `before_model_select` at the point the ADR says it can. In the ADR, the report is defined after provider switching and message transformation (`Cross-Provider Conversation Continuity`), but line 240 says it should be available to `before_model_select`. That hook currently runs before scoring/selection, and even in the ADR pipeline it still fires before a concrete winning model/provider pair exists. At that point there is no single `fromApi -> toApi` switch to report yet, only a set of candidates. So this part of the design is internally inconsistent: either the hook has to move later, or it needs a different input such as per-candidate predicted switch cost instead of a realized `ProviderSwitchReport`. Relevant ADR lines: 223-240.
[P1] The proposed `models.json` override path is keyed at the wrong layer and will not map cleanly onto current config semantics. The ADR example uses `providers.openai-responses.capabilities` (lines 2...
[P1] `ProviderSwitchReport` cannot be consumed by `before_model_select` at the point the ADR says it can. In the ADR, the report is defined after provider switching and message transformation (`Cross-Provider Conversation Continuity`), but line 240 says it should be available to `before_model_select`. That hook currently runs before scoring/selection, and even in the ADR pipeline it still fires before a concrete winning model/provider pair exists. At that point there is no single `fromApi -> toApi` switch to report yet, only a set of candidates. So this part of the design is internally inconsistent: either the hook has to move later, or it needs a different input such as per-candidate predicted switch cost instead of a realized `ProviderSwitchReport`. Relevant ADR lines: 223-240.
[P1] The proposed `models.json` override path is keyed at the wrong layer and will not map cleanly onto current config semantics. The ADR example uses `providers.openai-responses.capabilities` (lines 2...
@jeremymcs thanks for the guidance around next steps.
This sounds like a blocker in the shadows:
> If it demonstrates clear improvement...
I think it's useful to first establish what that criteria would be, specifically where the paper falls short. Then that evidence can be gathered.
> ... especially with config flags like shannon_kolmogorov_bias that require reading a paper to understand.
I think docs would be sufficient. `feature_weight`: `none, partial, full` would be equivalent.
This sounds like a blocker in the shadows:
> If it demonstrates clear improvement...
I think it's useful to first establish what that criteria would be, specifically where the paper falls short. Then that evidence can be gathered.
> ... especially with config flags like shannon_kolmogorov_bias that require reading a paper to understand.
I think docs would be sufficient. `feature_weight`: `none, partial, full` would be equivalent.
The main issue is VISION.md alignment. The project is extension-first: if it can be an extension, it should be. Nothing here requires core integration. GSD-2 already has an extension registration system, custom workflow definitions with pluggable verification policies, and a step-based engine that handles sequencing and artifact production. Gherkin generation, hash locking, and BDD enforcement all fit on top of that without touching core.
As proposed, this would cut across state management, the verification gate, auto-mode, preferences, and the planning pipeline — deep core changes for an opt-in workflow preference. That bumps into "complexity without user value" territory per VISION.md, especially with config flags like `shannon_kolmogorov_bias` that require reading a paper to understand.
The path forward would be to build this as an extension. Prove the value there across different providers and project types. If it demonstrates clear improvement, then there's a conversation about ...
As proposed, this would cut across state management, the verification gate, auto-mode, preferences, and the planning pipeline — deep core changes for an opt-in workflow preference. That bumps into "complexity without user value" territory per VISION.md, especially with config flags like `shannon_kolmogorov_bias` that require reading a paper to understand.
The path forward would be to build this as an extension. Prove the value there across different providers and project types. If it demonstrates clear improvement, then there's a conversation about ...
> It's kind of a natural fit to describe what needs to be done to AI.
Agree. And instinctively i've been interacting with AI using Gherkin habits.... But it was nice to see a formal demonstration and explanation (proof is too strong a term) for what the magnitude of the effect is.
Agree. And instinctively i've been interacting with AI using Gherkin habits.... But it was nice to see a formal demonstration and explanation (proof is too strong a term) for what the magnitude of the effect is.
I think is not a bad idea.
> BDD (Behavior-Driven Development) is a software development approach where you define how the system should behave from the user’s perspective before writing the actual code.
It's kind of a natural fit to describe what needs to be done to AI.
> BDD (Behavior-Driven Development) is a software development approach where you define how the system should behave from the user’s perspective before writing the actual code.
It's kind of a natural fit to describe what needs to be done to AI.
Related Early-Stage Discoveries
Discovery Source
GitHub Open Source Aggregated via automated community intelligence tracking.
Tech Stack Dependencies
No direct open-source NPM package mentions detected in the product documentation.
Media Tractions & Mentions
No mainstream media stories specifically mentioning this product name have been intercepted yet.
Deep Research & Science
No direct peer-reviewed scientific literature matched with this product's architecture.
Market Trends