Comment on: Quality validation: perplexity, KL divergence, and NIAH benchmarks
Repo: TheTom/turboquant_plus by TheTom
## QUALITY FIXED ✅
Perplexity with inverse rotation restored in dequant:
| Cache | PPL | vs q8_0 |
|-------|-----|---------|
| f16 | 6.121 | — |
| q8_0 | 6.111 | baseline |
| q4_0 | 6.142 | +0.5% |
| **turbo3** | **6.194** | **+1.4%** |
turbo3 is within 1.4% of q8_0 perplexity. Quality target met.
Speed is back to ~10.7 tok/s (pre-optimization level) because the inverse rotation
is in the dequant hot path. The pre-rotate-queries optimization needs to be
reimplemented to work with GQA head layout (ne[0]=256 for concatenated heads)
and hybrid memory types.
GitHub Issue
Parent Entity
State: Open • Comments: 9
SaaS Metrics