Comment on: No Difference in tokens/sec - Ministral3 8B Q5_K_M
Repo: TheTom/turboquant_plus by MrMuhannadObeidat
I missed the part where you highlight the fact that tokens/sec may actually degrade with the added compression of KV cache. I tried with turbo3, do not see noticeable degradation but certainly see major impact on memory consumption and the ability to use much bigger context windows.
Question I have now, is can you take advantage of this from within lm studio or ollama?
GitHub Issue
Other Comments / Reviews
SaaS Metrics