Comment on: Quality validation: perplexity, KL divergence, and NIAH benchmarks

Repo: TheTom/turboquant_plus by TheTom

Posted: Mar 25, 2026

## QUALITY FIXED ✅ Perplexity with inverse rotation restored in dequant: | Cache | PPL | vs q8_0 | |-------|-----|---------| | f16 | 6.121 | — | | q8_0 | 6.111 | baseline | | q4_0 | 6.142 | +0.5% | | **turbo3** | **6.194** | **+1.4%** | turbo3 is within 1.4% of q8_0 perplexity. Quality target met. Speed is back to ~10.7 tok/s (pre-optimization level) because the inverse rotation is in the dequant hot path. The pre-rotate-queries optimization needs to be reimplemented to work with GQA head layout (ne[0]=256 for concatenated heads) and hybrid memory types.

GitHub Issue

Parent Entity

Quality validation: perplexity, KL divergence, and NIAH benchmarks

State: Open • Comments: 9

Other Comments / Reviews

Good question. turbo3 at +1.4% vs q8_0 is worse than q4_0...

by TheTom Mar 26, 2026
How is turbo3 being worse than q4 quality target met?

by Rotatingxenomorph Mar 26, 2026
## Root causes found ### 1. V cache in rotated space Pyt...

by TheTom Mar 25, 2026
## CRITICAL: Perplexity test reveals quality failure | C...

by TheTom Mar 25, 2026