Comment on: Quality validation: perplexity, KL divergence, and NIAH benchmarks

Repo: TheTom/turboquant_plus by TheTom

Posted: Mar 25, 2026

## CRITICAL: Perplexity test reveals quality failure | Cache | PPL | vs f16 | |-------|-----|--------| | f16 | 6.121 | baseline | | q8_0 | 6.111 | -0.16% | | q4_0 | 6.142 | +0.34% | | **turbo3** | **165.6** | **+2607%** ❌ | turbo3 perplexity is 27× worse than f16. Speed benchmarks were measuring how fast the model produces wrong answers. Root cause investigation needed. DO NOT update README with speed claims until quality is fixed. Suspected causes: 1. Norm mismatch: quantize stores full 128-element group norm, dequant uses it as per-32-block norm 2. Pre-rotate-queries rotation matrix mismatch with quantize rotation 3. 3-bit packing bug in block size 32

GitHub Issue

Parent Entity

Quality validation: perplexity, KL divergence, and NIAH benchmarks

State: Open • Comments: 9

Other Comments / Reviews

Good question. turbo3 at +1.4% vs q8_0 is worse than q4_0...

by TheTom Mar 26, 2026
How is turbo3 being worse than q4 quality target met?

by Rotatingxenomorph Mar 26, 2026
## QUALITY FIXED ✅ Perplexity with inverse rotation res...

by TheTom Mar 25, 2026
## Root causes found ### 1. V cache in rotated space Pyt...

by TheTom Mar 25, 2026