← Back to AI Insights
Gemini Executive Synthesis

OBLITERATUS support for native NVFP4 / ModelOpt checkpoints.

Technical Positioning
Expanding OBLITERATUS's compatibility to include emerging, VRAM-efficient quantization formats like NVFP4, enabling users to process 'stronger models on consumer GPUs' and facilitating local 'abliteration workflows'.
SaaS Insight & Market Implications
This issue identifies a critical compatibility gap in OBLITERATUS: its inability to support native NVFP4 / ModelOpt checkpoints. This format is crucial for running 'stronger models on consumer GPUs' by optimizing VRAM usage. The current system, designed for `torch.float16` or `bitsandbytes` quantization, fails to load NVFP4 models due to 'parameter shape mismatches.' This directly blocks local 'abliteration workflows' for a growing segment of users leveraging these efficient formats. The lack of support for modern quantization techniques limits OBLITERATUS's market relevance and accessibility, particularly for users seeking to maximize performance on constrained hardware. This represents a significant technical debt impacting product utility and adoption.
Proprietary Technical Taxonomy
NVFP4 ModelOpt checkpoints torch_dtype=torch.float16 bitsandbytes 4-bit fallback BitsAndBytesConfig NF4 HF checkpoints BnB quantization

Raw Developer Origin & Technical Request

Source Icon GitHub Issue Mar 6, 2026
Repo: elder-plinius/OBLITERATUS
Support native NVFP4 / ModelOpt checkpoints (e.g. Qwen3.5-9B-NVFP4)

## Problem

OBLITERATUS currently appears to assume either:
- regular `torch_dtype=torch.float16` loading, or
- bitsandbytes 4-bit fallback (`BitsAndBytesConfig`, NF4)

That works for standard HF checkpoints and BnB quantization, but not for NVIDIA ModelOpt / NVFP4 checkpoints such as `AxionML/Qwen3.5-9B-NVFP4`.

## Why this matters

NVFP4 checkpoints are becoming a practical format for running stronger models on consumer GPUs. Right now, OBLITERATUS cannot be used directly on them, which blocks local abliteration workflows for users who specifically chose NVFP4 to fit within VRAM.

## Reproduction

Model tested:
- `AxionML/Qwen3.5-9B-NVFP4`

Environment:
- Arch Linux
- NVIDIA RTX 5090D, 32 GB VRAM
- local model path, no proxy, local-only workflow

Attempted command:
```bash
python -m obliteratus.cli obliterate /path/to/AxionML_Qwen3.5-9B-NVFP4 \
--method optimized \
--output-dir /path/to/output \
--verify-sample-size 20
```

Observed result:
- initial missing dependency was resolved (`accelerate`)
- model load still failed with parameter shape mismatches rather than reaching the actual obliteration stage
- checkpoint config indicated native NVFP4 / ModelOpt quantization metadata
- installing related packages (`compressed-tensors`, `nvidia-modelopt`) was not sufficient to make the current OBLITERATUS loading path succeed

Representative failure pattern:
- layers expected tensors shaped like `... x 4096`
- checkpoint provided tensors shaped like `... x 2048`
- extra `wei...

Developer Debate & Comments

No active discussions extracted for this entry yet.

Adjacent Repository Pain Points

Other highly discussed features and pain points extracted from elder-plinius/OBLITERATUS.

Extracted Positioning
OBLITERATUS local app CLI startup on MacOS.
Ensuring a smooth, functional first-time setup and execution of the OBLITERATUS local app CLI on MacOS.
Extracted Positioning
OBLITERATUS model weight modification process (EXCISE).
Ensuring robust and type-safe weight modification during the 'obliteration' process, preventing fundamental data type casting errors.
Extracted Positioning
OBLITERATUS GPU detection and utilization.
Leveraging dedicated GPU hardware (RTX 3060 12GB) for accelerated model processing, moving beyond CPU-only operation.
Extracted Positioning
OBLITERATUS UI App GPU utilization.
Maximizing GPU resource utilization for efficient model processing within the OBLITERATUS UI, ensuring optimal performance for users with dedicated hardware.
Extracted Positioning
OBLITERATUS chat functionality post-model obliteration.
Providing functional chat interaction with 'obliterated' models, enabling users to validate and utilize the processed models effectively.

Engagement Signals

2
Replies
open
Issue Status

Cross-Market Term Frequency

Quantifies the cross-market adoption of foundational terms like tensors and VRAM by tracking occurrence frequency across active SaaS architectures and enterprise developer debates.