Inconsistent error classification and retry logic for transient HTTP errors when interacting with AI providers (e.g., DashScope).
Raw Developer Origin & Technical Request
GitHub Issue
May 14, 2026
### Problem
Hi! While running batches against DashScope, I noticed turns sometimes
terminate immediately on a single transient HTTP error (e.g. `Request error: Server disconnected without sending a response`), without the
configured `max_provider_retries` taking effect.
There appear to be two classifiers that disagree:
- `provider/failures.py` maps `"request error"` / `"timeout"` to
`ProviderFailureKind.TRANSPORT_TRANSIENT`.
- `engine/fallback.py` `FallbackPolicy.classify_error` only recognizes
`RATE_LIMIT`, `AUTH_FAILURE`, `OVERLOADED`, `CONTEXT_OVERFLOW`;
anything else returns `UNKNOWN`, and `should_retry` returns `False`.
So `httpx.RequestError` from `provider/openai.py` ends up as `UNKNOWN`
and skips retry.
Is the split intentional? If so, is there a recommended way to have
`FallbackPolicy` honor `TRANSPORT_TRANSIENT` so transient errors
flow into `max_provider_retries`?
Thanks!
### Proposed behavior
Transient transport errors should be retried according to `max_provider_retries`.
Specifically, errors classified by the provider layer as
`ProviderFailureKind.TRANSPORT_TRANSIENT` should not become `UNKNOWN` in
`FallbackPolicy`; they should be considered retryable by `should_retry`.
Non-transient and truly unknown errors can keep the current behavior.
### Area
CLI
### Alternatives considered
I tried increasing `max_provider_retries`, but it does not help because the
error is classified as `UNKNOWN` before retry logic runs.
Developer Debate & Comments
No active discussions extracted for this entry yet.
Adjacent Repository Pain Points
Other highly discussed features and pain points extracted from opensquilla/opensquilla.
Frequently Asked Questions
Market intelligence mapped to Inconsistent error classification and retry logic for transient HTTP errors when interacting with AI providers (e.g., DashScope)..
What problem does Inconsistent error classification and retry logic for transient HTTP errors when interacting with AI providers (e.g., DashScope). solve?
Which technical concepts are associated with Inconsistent error classification and retry logic for transient HTTP errors when interacting with AI providers (e.g., DashScope).?
Engagement Signals
Cross-Market Term Frequency
Quantifies the cross-market adoption of foundational terms like RATE_LIMIT and transient HTTP request errors by tracking occurrence frequency across active SaaS architectures and enterprise developer debates.
SaaS Metrics