Comment on: Quality validation: perplexity, KL divergence, and NIAH benchmarks
Repo: TheTom/turboquant_plus by TheTom
## CRITICAL: Perplexity test reveals quality failure
| Cache | PPL | vs f16 |
|-------|-----|--------|
| f16 | 6.121 | baseline |
| q8_0 | 6.111 | -0.16% |
| q4_0 | 6.142 | +0.34% |
| **turbo3** | **165.6** | **+2607%** ❌ |
turbo3 perplexity is 27× worse than f16. Speed benchmarks were measuring how fast the model produces wrong answers.
Root cause investigation needed. DO NOT update README with speed claims until quality is fixed.
Suspected causes:
1. Norm mismatch: quantize stores full 128-element group norm, dequant uses it as per-32-block norm
2. Pre-rotate-queries rotation matrix mismatch with quantize rotation
3. 3-bit packing bug in block size 32
GitHub Issue
Parent Entity
State: Open • Comments: 9
SaaS Metrics