ROIpad ← Back to Search
github.com › repository issue

Quality validation: perplexity, KL divergence, and NIAH benchmarks

TheTom/turboquant_plus
Status: Open
Opened: Mar 25, 2026
Comments: 9
## Supersedes #24 We claim 4.6× compression at 91-97% speed. But we have ZERO quantitative quality data on the llama.cpp build. ## Required benchmarks (in priority order): ### 1. Perplexity (wikitext-2) - f16, q8_0, q4_0, q4_1, q5_0, turbo3 - Target: turbo3 within 1% of q8_0 - If >2% worse: quality problem ### 2. KL Divergence vs f16 - Required by llama.cpp CONTRIBUTING.md for new quant types - Metrics: mean KLD, delta-p RMS, same-top-p % ### 3. Passkey Retrieval (NIAH) - At 1K, 2K, 4K, 8K context lengths - Prince Canuma got 6/6 at all lengths ### 4. Generation Quality (qualitative) - Side-by-side comparison ## Tracking Full plan and results in docs/quality-benchmarks.md
Python
View on GitHub ↗
Related Content