Comment on: Quality validation: perplexity, KL divergence, and NIAH benchmarks

Repo: TheTom/turboquant_plus by TheTom

Posted: Mar 25, 2026

## Root causes found ### 1. V cache in rotated space Python verification: dequant output has cosine=0.02 with input (garbage). After inverse rotation: cosine=0.987 (correct). V cache values MUST be inverse-rotated after attention. ### 2. dynamic_cast fails for MoE models The Qwen 3.5 MoE uses `llama_memory_hybrid_context`, not `llama_kv_cache_context`. Our `dynamic_cast` returns null → Q rotation and V inverse rotation NEVER execute. ALL speed benchmarks were on unrotated Q with rotated-space K/V — garbage results with fast speed. ### Why "coherent text" was misleading Without any rotation applied, the raw quantize/dequant produces plausible-looking grammar but wrong content. Short conversations hide this. Perplexity caught it. ### Fix needed Store rotation tensors in `llm_graph_context` directly (not behind a KV cache dynamic_cast). Then both Q rotation and V inverse rotation will work for ALL memory types.

GitHub Issue

Parent Entity

Quality validation: perplexity, KL divergence, and NIAH benchmarks

State: Open • Comments: 9

Other Comments / Reviews

Good question. turbo3 at +1.4% vs q8_0 is worse than q4_0...

by TheTom Mar 26, 2026
How is turbo3 being worse than q4 quality target met?

by Rotatingxenomorph Mar 26, 2026
## QUALITY FIXED ✅ Perplexity with inverse rotation res...

by TheTom Mar 25, 2026
## CRITICAL: Perplexity test reveals quality failure | C...

by TheTom Mar 25, 2026