ROIpad ← Back to Search
github.com › issue comment

Comment on: Quality validation: perplexity, KL divergence, and NIAH benchmarks

Repo: TheTom/turboquant_plus by TheTom
Posted: Mar 25, 2026
## QUALITY FIXED ✅ Perplexity with inverse rotation restored in dequant: | Cache | PPL | vs q8_0 | |-------|-----|---------| | f16 | 6.121 | — | | q8_0 | 6.111 | baseline | | q4_0 | 6.142 | +0.5% | | **turbo3** | **6.194** | **+1.4%** | turbo3 is within 1.4% of q8_0 perplexity. Quality target met. Speed is back to ~10.7 tok/s (pre-optimization level) because the inverse rotation is in the dequant hot path. The pre-rotate-queries optimization needs to be reimplemented to work with GQA head layout (ne[0]=256 for concatenated heads) and hybrid memory types.
GitHub Issue