GitHub Issue

Support native NVFP4 / ModelOpt checkpoints (e.g. Qwen3.5-9B-NVFP4)

Discovered On Mar 6, 2026

Primary Metric open

## Problem OBLITERATUS currently appears to assume either: - regular `torch_dtype=torch.float16` loading, or - bitsandbytes 4-bit fallback (`BitsAndBytesConfig`, NF4) That works for standard HF checkpoints and BnB quantization, but not for NVIDIA ModelOpt / NVFP4 checkpoints such as `AxionML/Qwen3.5-9B-NVFP4`. ## Why this matters NVFP4 checkpoints are becoming a practical format for running stronger models on consumer GPUs. Right now, OBLITERATUS cannot be used directly on them, which blocks local abliteration workflows for users who specifically chose NVFP4 to fit within VRAM. ## Reproduction Model tested: - `AxionML/Qwen3.5-9B-NVFP4` Environment: - Arch Linux - NVIDIA RTX 5090D, 32 GB VRAM - local model path, no proxy, local-only workflow Attempted command: ```bash python -m obliteratus.cli obliterate /path/to/AxionML_Qwen3.5-9B-NVFP4 \ --method optimized \ --output-dir /path/to/output \ --verify-sample-size 20 ``` Observed result: - initial missing dependency was resolved (`accelerate`) - model load still failed with parameter shape mismatches rather than reaching the actual obliteration stage - checkpoint config indicated native NVFP4 / ModelOpt quantization metadata - installing related packages (`compressed-tensors`, `nvidia-modelopt`) was not sufficient to make the current OBLITERATUS loading path succeed Representative failure pattern: - layers expected tensors shaped like `... x 4096` - checkpoint provided tensors shaped like `... x 2048` - extra `wei...

View Raw Thread

Developer & User Discourse

Vastopian • Mar 6, 2026

I don't think you abliterate on quant models. I'm pretty sure you need to abliterate first then quant to nvfp4. I think it only uses 4bit for the finding the refusals.

derekszen • Mar 6, 2026

just a convenince thing tbh