turbo3 and turbo4 quantization implementation, specifically related to block size changes and kernel instantiation.
Raw Developer Origin & Technical Request
GitHub Issue
Mar 25, 2026
## Codex post-commit review found 3 bugs in block size 32 change:
### 1. CRITICAL: SET_ROWS kernel is turbo3-specific but still instantiated for turbo4
The kernel_set_rows_turbo now hardcodes turbo3 packing (MSE-only, 3-bit split).
But it's still instantiated for turbo4 which uses different block layout.
Result: turbo4 cache writes are corrupted.
### 2. HIGH: SET_ROWS drops tail blocks when nk0 not multiple of 4
Integer division n_groups = nk0 / 4 drops remainders.
For dk=192: nk0=6, only 4 blocks processed (last 2 dropped).
Only affects non-128 head dims. Qwen uses 128 so this doesn't bite us yet.
### 3. CRITICAL: TURBO_D = QK_TURBO3 = 32 breaks turbo4 C code
The C code uses TURBO_D for array sizes. Now 32 instead of 128.
Turbo4 CPU paths have out-of-bounds array access.
## Fix
- #1: Separate turbo3 and turbo4 SET_ROWS kernel instantiations
- #2: Add assertion nk0 % 4 == 0 or handle remainder
- #3: Change TURBO_D to 128 (always), independent of QK_TURBO3
Developer Debate & Comments
No active discussions extracted for this entry yet.
Adjacent Repository Pain Points
Other highly discussed features and pain points extracted from TheTom/turboquant_plus.
Engagement Signals
Cross-Market Term Frequency
Quantifies the cross-market adoption of foundational terms like Qwen and turbo4 by tracking occurrence frequency across active SaaS architectures and enterprise developer debates.
Market Trends