Comment on: No Difference in tokens/sec - Ministral3 8B Q5_K_M

Repo: TheTom/turboquant_plus by MrMuhannadObeidat

Posted: Mar 29, 2026

I missed the part where you highlight the fact that tokens/sec may actually degrade with the added compression of KV cache. I tried with turbo3, do not see noticeable degradation but certainly see major impact on memory consumption and the ability to use much bigger context windows. Question I have now, is can you take advantage of this from within lm studio or ollama?

GitHub Issue

Parent Entity

No Difference in tokens/sec - Ministral3 8B Q5_K_M

State: Open • Comments: 3

Other Comments / Reviews

You can just use it like Ollama or LM studio without the ...

by zekrom-vale Mar 31, 2026
这个只是对kv缓存压缩，所以只是提升了最大推理上下文的大小...

by zrlhk Mar 31, 2026