Comment on: No Difference in tokens/sec - Ministral3 8B Q5_K_M

Repo: TheTom/turboquant_plus by zekrom-vale

Posted: Mar 31, 2026

You can just use it like Ollama or LM studio without the slow development or wrapper overheads. I use llama cpp directly and use it with router mode and many llms configured with models.ini and interface it with Open Web UI you can find others too. It's nice, but it has a lot more overhead in setting it up. Use `llama-server --host 0.0.0.0 --port 8080 --models-preset /path/to/ini/models.ini` and access it through an Open AI compatible UI like Open Web UI at `http://:8080`. Or 127.0.0.1 as the loop-back if not connecting to a different computer as I do. Here is an example model ini file I use for my 5070ti. [models.ini.txt](https://github.com/user-attachments/files/26391236/models.ini.txt)

GitHub Issue

Parent Entity

No Difference in tokens/sec - Ministral3 8B Q5_K_M

State: Open • Comments: 3

Other Comments / Reviews

这个只是对kv缓存压缩，所以只是提升了最大推理上下文的大小...

by zrlhk Mar 31, 2026
I missed the part where you highlight the fact that token...

by MrMuhannadObeidat Mar 29, 2026