Comment on: No Difference in tokens/sec - Ministral3 8B Q5_K_M
Repo: TheTom/turboquant_plus by zekrom-vale
You can just use it like Ollama or LM studio without the slow development or wrapper overheads. I use llama cpp directly and use it with router mode and many llms configured with models.ini and interface it with Open Web UI you can find others too. It's nice, but it has a lot more overhead in setting it up.
Use `llama-server --host 0.0.0.0 --port 8080 --models-preset /path/to/ini/models.ini` and access it through an Open AI compatible UI like Open Web UI at `http://:8080`. Or 127.0.0.1 as the loop-back if not connecting to a different computer as I do.
Here is an example model ini file I use for my 5070ti.
[models.ini.txt](https://github.com/user-attachments/files/26391236/models.ini.txt)
GitHub Issue
Other Comments / Reviews
SaaS Metrics