TurboQuant's quantization strategy, specifically regarding K/V norm disparity, attention quantization methods (MSE vs. Prod), and outlier detection (dynamic vs. fixed).
Raw Developer Origin & Technical Request
GitHub Issue
Mar 28, 2026
Hi! We independently implemented TurboQuant and ran systematic benchmarks across 8 models. Found some things that might be useful for your outlier.py implementation:
## K/V Norm Disparity
Modern models have dramatically different K vs V norms:
| Model | K norm | V norm | Ratio |
|-------|--------|--------|-------|
| GPT-2 | 11.8 | 2.0 | 6x |
| Phi-2 | 13.1 | 3.0 | 4x |
| Qwen2.5-3B | 172.1 | 3.3 | 52x |
| Qwen2.5-7B | 274.0 | 2.6 | 106x |
| Qwen2.5-1.5B | 778.6 | 4.3 | 182x |
This means K and V need very different bit budgets. K/V ratio > 100x (Qwen family) needs mixed precision for K — uniform quantization fails catastrophically.
## MSE beats Prod for Attention
Paper recommends TurboQuantProd (QJL) for Keys. We found MSE for both K and V works much better:
- GPT-2 b=3: MSE gives +7.6% PPL, Prod gives +300% PPL
- Reason: QJL variance is amplified by softmax
## Dynamic vs Fixed Outlier Detection
Your outlier.py uses the paper's fixed allocation (32 outlier / 96 regular for d=128). We tried dynamic detection (channels with RMS > 3x median = outlier):
- Layer 0 has ~20% outliers (RMS up to 272 vs median 1.7)
- Middle layers have only 4-6% outliers
- Per-layer dynamic detection may be more efficient than fixed allocation
## Result
With dynamic outlier detection (outliers at 8-bit, rest at 3-bit):
- Qwen2.5-1.5B: **3.6-bit avg, +2.1% PPL** (vs +78% with uniform 4.5-bit)
Our implementation + all benchmark data: github.com/scos-lab/turboqua...
Great work on turboq...
Developer Debate & Comments
No active discussions extracted for this entry yet.
Adjacent Repository Pain Points
Other highly discussed features and pain points extracted from TheTom/turboquant_plus.
Engagement Signals
Cross-Market Term Frequency
Quantifies the cross-market adoption of foundational terms like Prod and Attention by tracking occurrence frequency across active SaaS architectures and enterprise developer debates.
Market Trends