ROIpad ← Back to Search
github.com › issue comment

Comment on: Quality validation: perplexity, KL divergence, and NIAH benchmarks

Repo: TheTom/turboquant_plus by TheTom
Posted: Mar 25, 2026
## CRITICAL: Perplexity test reveals quality failure | Cache | PPL | vs f16 | |-------|-----|--------| | f16 | 6.121 | baseline | | q8_0 | 6.111 | -0.16% | | q4_0 | 6.142 | +0.34% | | **turbo3** | **165.6** | **+2607%** ❌ | turbo3 perplexity is 27× worse than f16. Speed benchmarks were measuring how fast the model produces wrong answers. Root cause investigation needed. DO NOT update README with speed claims until quality is fixed. Suspected causes: 1. Norm mismatch: quantize stores full 128-element group norm, dequant uses it as per-32-block norm 2. Pre-rotate-queries rotation matrix mismatch with quantize rotation 3. 3-bit packing bug in block size 32
GitHub Issue