GitHub Issue

Running on WSL2 Ubuntu 22.04 fails

Discovered On Apr 6, 2026

Primary Metric open

First, thanks for this. As litert still does not support native Windows, I tried to run it in WSL2 under Ubuntu 22.04. ``` lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 22.04.5 LTS Release: 22.04 Codename: jammy ``` Nvidia and CUDA seem to be working OK. Also the GPU is set to be used in rendering. ``` glxinfo | grep "OpenGL renderer" OpenGL renderer string: D3D12 (NVIDIA GeForce RTX 4080) ``` ``` vulkaninfo | grep "deviceName" WARNING: dzn is not a conformant Vulkan implementation, testing use only. WARNING: dzn is not a conformant Vulkan implementation, testing use only. deviceName = Microsoft Direct3D12 (NVIDIA GeForce RTX 4080 GPU) deviceName = llvmpipe (LLVM 15.0.7, 256 bits) deviceName = Microsoft Direct3D12 (Intel(R) Iris(R) Xe Graphics) ``` nvidia-smi works as expected and I can see the GPU. I can run this command fine: ``` litert-lm run \ --from-huggingface-repo=litert-community/gemma-4-E2B-it-litert-lm \ gemma-4-E2B-it.litertlm \ --prompt="What is the capital of France?" ``` And I get a quick response: `The capital of France is **Paris**.` To get parlor up and running, I ran: 1. uv sync in parlor's src directory, and I get the .venv setup OK. 2. uv run python server.py causes an error. I'm including the output below. It looks to me that the CPU engine backend (software rendering) is being loaded, instead of the GPU's. ``` nemisis...

View Raw Thread

Developer & User Discourse

fikrikarim • Apr 6, 2026

Thanks for the detailed information. Unfortunately, I don't have a Windows machine myself so it's hard for me to debug it. I'll give it a try since there seems to be some people that show interest on running this on Windows. Although, I can't promise anything right now.

fikrikarim • Apr 7, 2026

@yhdanid could you try running the `litert-lm` CLI with the gpu backend? The default backend is CPU.
```
litert-lm run \
--backend=gpu \
--from-huggingface-repo=litert-community/gemma-4-E2B-it-litert-lm \
gemma-4-E2B-it.litertlm \
--prompt="What is the capital of France?"
```

If that works, try running the `litert-lm benchmark` to double check whether it's actually running on the GPU:
```
litert-lm benchmark \
--backend=gpu \
--from-huggingface-repo=litert-community/gemma-4-E2B-it-litert-lm \
gemma-4-E2B-it.litertlm
```

and compare with
```
litert-lm benchmark \
--backend=cpu \
--from-huggingface-repo=litert-community/gemma-4-E2B-it-litert-lm \
gemma-4-E2B-it.litertlm
```

yhdanid • Apr 7, 2026

### SUMMARY
I've tried with --backend=cpu, it works fine, and also seems to work when --backend=gpu is specified. Benchmarking fails on GPU backend but works fine in CPU. So I'm confused.

Note that I did this with model locally downloaded in /model directory (to save on bandwidth as I was testing), and while downloading from huggingface. All this was done in WSL2, Ubuntu 22.04.5 LTS.

**GPU IS ACCESSIBLE FROM TORCH**
As an experiment, I installed pytorch in another virtual environment and tested if it can access the GPU, and I got these results:

```
(.venv) nemisis@nemisis-BLACK:/mnt/e/sandbox/parlor/test-cuda$ python torch_cuda_test.py
Is CUDA available? True
CUDA device count: 1
CUDA current device: 0
Torch CUDA device:
CUDA device name: NVIDIA GeForce RTX 4080 GPU
```

**DOCKER**
I also tried seeing if parlor works in docker as it is more widely available than WSL (image built FROM nvidia/cuda:12.8.0-devel-ubuntu24.04) with more or les...

fikrikarim • Apr 11, 2026

Thanks for the additional information.

Could you changing the `litert_lm.Backend` to be CPU on all backend on the `server.py`?
```python
engine = litert_lm.Engine(
MODEL_PATH,
backend=litert_lm.Backend.CPU,
vision_backend=litert_lm.Backend.CPU,
audio_backend=litert_lm.Backend.CPU,
)
```

If that works, then there's a problem with the GPU inference on your setup.

If you could isolate and reproduce the issue just on python `litert_lm` without additional stuff from this repo, perhaps it'd be better to report the issue or ask for support from the main [LiteRT-LM repo](https://github.com/google-ai-edge/LiteRT-LM)?

yhdanid • Apr 14, 2026

I have looked around briefly concerning this, and it seems the matter is not only related to litert (which seems to rely on Vulkan for GPU hardware acceleration), but I've also found out that even llama.cpp's Vulkan implementation won't use the dedicated NVIDIA GPU in WSL. Even when I specify --device MY_VULKAN_DEVICE (for NVIDIA) in the command. It still falls back to CPU acceleration. So this is not a litert issue, or llama.cpp issue, or a Parlor issue, but rather a Vulkan issue when running in WSL.

So I suggest closing this as there is nothing that can be done until a solution upstream is found for Vulkan running in WSL in GPU acceleration mode.