GitHub Issue
Don't work on vulkan device
~/Scaricati/llama-cpp-turboquant/build/bin$ sudo ./llama-server -m /media/vincenzo/Dati/models/unsloth/Qwen3.5-27B-GGUF/Qwen3.5-27B-Q6_K.gguf -ctk turbo3 -ctv turbo3
ggml_vulkan: Found 2 Vulkan devices:
ggml_vulkan: 0 = AMD Ryzen 9 9900X 12-Core Processor (RADV RAPHAEL_MENDOCINO) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 65536 | int dot: 0 | matrix cores: none
ggml_vulkan: 1 = AMD Radeon RX 7900 XTX (RADV NAVI31) (radv) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: KHR_coopmat
main: n_parallel is set to auto, using n_parallel = 4 and kv_unified = true
build: 8621 (a52586e2a) with GNU 15.2.0 for Linux x86_64
system info: n_threads = 12, n_threads_batch = 12, total_threads = 24
system_info: n_threads = 12 (n_threads_batch = 12) / 24 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 |
init: using 23 threads for HTTP server
start: binding port with default address family
main: loading model
srv load_model: loading model '/media/vincenzo/Dati/models/unsloth/Qwen3.5-27B-GGUF/Qwen3.5-27B-Q6_K.gguf'
common_init_result: fitting params to device memory, for bugs during this step try to reproduce them with -fit off, or provide --verbose logs if the bug only occurs with -fit on
/home/vincenzo/Scaricati/llama-cpp-turboquant/ggml/src/ggml-backend.cpp:809...
View Raw Thread
Market Trends