GitHub Issue

ADR-005: Multi-Model, Multi-Provider, and Tool Strategy

Discovered On Mar 27, 2026
Primary Metric open
# ADR-005: Multi-Model, Multi-Provider, and Tool Strategy **Status:** Draft **Date:** 2026-03-27 **Deciders:** Jeremy McSpadden **Related:** ADR-004 (capability-aware model routing), ADR-003 (pipeline simplification), [PR #2755](https://github.com/gsd-build/gsd-2/pull/2755) ## Context PR #2755 lands capability-aware model routing (ADR-004), extending the router from a one-dimensional complexity-tier system to a two-dimensional system that scores models across 7 capability dimensions. GSD can now intelligently pick the best model for a task from a heterogeneous pool. But model selection is only one piece of the multi-model puzzle. The system now faces a set of structural gaps that become more pressing as users configure diverse provider pools: ### 1. Tool compatibility is assumed, not verified Every registered tool is sent to every model regardless of provider. The `pi-ai` layer normalizes tool schemas per provider (Anthropic `tool_use`, OpenAI `function`, Google `functionDeclarations`, Bedrock `toolSpec`, Mistral `FunctionTool`), but there is no mechanism to express that: - A model may not support tool calling at all (older/smaller models, some local models) - A provider may not support certain schema features (Google Gemini doesn't support `patternProperties`; `sanitizeSchemaForGoogle()` patches this silently) - Some tools produce image content in results that not all models can consume - Tool call ID formats differ across providers (Anthropic: 64-char alphanumeric; O...
View Raw Thread

Developer & User Discourse

jeremymcs • Mar 27, 2026
Codex

[P1] `ProviderSwitchReport` cannot be consumed by `before_model_select` at the point the ADR says it can. In the ADR, the report is defined after provider switching and message transformation (`Cross-Provider Conversation Continuity`), but line 240 says it should be available to `before_model_select`. That hook currently runs before scoring/selection, and even in the ADR pipeline it still fires before a concrete winning model/provider pair exists. At that point there is no single `fromApi -> toApi` switch to report yet, only a set of candidates. So this part of the design is internally inconsistent: either the hook has to move later, or it needs a different input such as per-candidate predicted switch cost instead of a realized `ProviderSwitchReport`. Relevant ADR lines: 223-240.

[P1] The proposed `models.json` override path is keyed at the wrong layer and will not map cleanly onto current config semantics. The ADR example uses `providers.openai-responses.capabilities` (lines 2...
jeremymcs • Mar 27, 2026
### **Gemini ADR-005 Review: Multi-Model, Multi-Provider, and Tool Strategy**

I have reviewed the proposal and its alignment with the existing routing architecture (ADR-004). This is a necessary evolution that correctly treats technical compatibility as a prerequisite for capability scoring.

#### **Key Findings**
* **Architectural Robustness:** The 4-step routing pipeline (**Tier Eligibility → Technical Filtering → Capability Ranking → Tool Set Adjustment**) is sound. It prevents "capability-blind" routing where a model might be highly ranked for reasoning but technically incapable of using the required tools.
* **Data Continuity:** The `ProviderSwitchReport` is a critical addition. Tracking "context loss" (thinking blocks, signatures) when moving between heterogeneous providers (e.g., Anthropic to Google) is essential for long-term session health.
* **Maintainability:** Centralizing provider "quirks" (schema limits, tool ID formats) into a declarative registry is a major impro...
jeremymcs • Mar 27, 2026
## ADR-005 Review: Findings and Recommendations (Revised)

As Grok, built by xAI, I've reviewed ADR-005: Multi-Model, Multi-Provider, and Tool Strategy based on a deep exploration of the codebase and the ADR content. Here's my analysis and recommendations, now incorporating additional changes from a deep dive into the issue comments.

### Overall Assessment
- **Architectural Soundness**: Excellent. The ADR builds logically on ADR-004's capability scoring, treating tool compatibility as a hard prerequisite filter rather than a soft score. This prevents routing to incompatible models while preserving the tiered, downgrade-safe pipeline.
- **Clarity and Structure**: High. The document is well-organized with clear principles, code examples, and a phased implementation. It explicitly addresses gaps in the current system (tool assumptions, provider failover degradation).
- **Feasibility**: Strong. Direct extensions to existing files (e.g., `model-router.ts` for filtering, `ToolDefinition` fo...
jeremymcs • Mar 27, 2026
## Commit: `8dc83440` — Close capability validation gaps across all dispatch paths

**Branch:** `feat/provider-capability-registry`

### What was done

1. **Expanded `getRequiredToolNames()`** — from 4 to 16 unit types with accurate tool requirements (plan-*, run-uat, replan-slice, complete-*, reactive-execute, rewrite-docs, reassess-roadmap, gate-evaluate)

2. **Fixed 3 unprotected dispatch paths** that bypassed all capability checks:
- `auto-direct-dispatch.ts` (`/gsd dispatch`) — now applies capability overrides + tool filtering
- `guided-flow.ts` (`dispatchWorkflow`) — now filters incompatible tools after model selection
- `auto.ts` (`dispatchHookUnit`) — now gets capability overrides + tool adjustment

3. **New `applyToolCompatibilityAdjustment()`** — reusable function used by all 3 dispatch paths

4. **Plan-time capability validation** — `validatePlanCapabilities()` checks task descriptions/files against model pool capabilities; `formatCapabilityConstraints()` injected i...
Sigmabrogz • Mar 28, 2026
Hi! I'd like to work on this. My understanding is that the current model routing system (ADR-004) scores models based on capabilities but assumes universal tool compatibility, which leads to silent failures and degraded context when switching providers. It lacks a declarative way to express provider constraints (like image support or schema limits) and tool requirements. I'm planning to fix it by implementing the Phase 1-3 roadmap described in the ADR: building the `ProviderCapabilities` registry in `pi-ai` to replace scattered provider-specific logic, adding `ToolCompatibility` metadata to `ToolDefinition`, and integrating the new hard-filtering step into `resolveModelForComplexity()` before the ADR-004 capability scoring. I will also build out the `ProviderSwitchReport` for cross-provider context tracking. This will be a substantial architectural lift across the core packages. Does this approach sound good?