How to use from
llama.cpp
Install from brew
brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf prithivMLmods/Qwen3-VL-8B-Instruct-Unredacted-MAX-GGUF:BF16
# Run inference directly in the terminal:
llama-cli -hf prithivMLmods/Qwen3-VL-8B-Instruct-Unredacted-MAX-GGUF:BF16
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf prithivMLmods/Qwen3-VL-8B-Instruct-Unredacted-MAX-GGUF:BF16
# Run inference directly in the terminal:
llama-cli -hf prithivMLmods/Qwen3-VL-8B-Instruct-Unredacted-MAX-GGUF:BF16
Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf prithivMLmods/Qwen3-VL-8B-Instruct-Unredacted-MAX-GGUF:BF16
# Run inference directly in the terminal:
./llama-cli -hf prithivMLmods/Qwen3-VL-8B-Instruct-Unredacted-MAX-GGUF:BF16
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf prithivMLmods/Qwen3-VL-8B-Instruct-Unredacted-MAX-GGUF:BF16
# Run inference directly in the terminal:
./build/bin/llama-cli -hf prithivMLmods/Qwen3-VL-8B-Instruct-Unredacted-MAX-GGUF:BF16
Use Docker
docker model run hf.co/prithivMLmods/Qwen3-VL-8B-Instruct-Unredacted-MAX-GGUF:BF16
Quick Links

Qwen3-VL-8B-Instruct-Unredacted-MAX-GGUF

Qwen3-VL-8B-Instruct-Unredacted-MAX is a state-of-the-art and unredacted evolution of the original Qwen3-VL-8B-Instruct model, carefully fine-tuned using advanced abliterated training strategies that are explicitly designed to reduce or eliminate internal refusal mechanisms which typically restrict the output of conventional vision-language models, while simultaneously preserving and enhancing the model’s intrinsic multimodal reasoning and instruction-following capabilities; as an 8-billion-parameter system, it is capable of understanding and processing highly complex visual inputs and producing unrestricted, richly detailed, contextually nuanced captions, explanations, and analyses across a wide array of domains including artistic, technical, scientific, forensic, and abstract content, enabling use cases such as high-fidelity data annotation, accessibility improvement, creative and narrative storytelling, historical or medical dataset curation, and thorough red-teaming or bias evaluation research, all while balancing computational efficiency, output precision, and interpretability, making it an ideal tool for researchers, developers, and professionals seeking a powerful, unfiltered, and versatile vision-language model that can reason deeply, follow complex instructions, and generate highly descriptive, human-like responses across diverse multimodal tasks.

Qwen3-VL-8B-Instruct-Unredacted-MAX [GGUF]

File Name Quant Type File Size File Link
Qwen3-VL-8B-Instruct-Unredacted-MAX.BF16.gguf BF16 16.4 GB Download
Qwen3-VL-8B-Instruct-Unredacted-MAX.Q8_0.gguf Q8_0 8.71 GB Download
Qwen3-VL-8B-Instruct-Unredacted-MAX.mmproj-bf16.gguf mmproj-bf16 1.16 GB Download
Qwen3-VL-8B-Instruct-Unredacted-MAX.mmproj-q8_0.gguf mmproj-q8_0 752 MB Download

Quants Usage

(sorted by size, not necessarily quality. IQ-quants are often preferable over similar sized non-IQ quants)

Here is a handy graph by ikawrakow comparing some lower-quality quant types (lower is better):

image.png

Downloads last month
546
GGUF
Model size
8B params
Architecture
qwen3vl
Hardware compatibility
Log In to add your hardware

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for prithivMLmods/Qwen3-VL-8B-Instruct-Unredacted-MAX-GGUF

Collection including prithivMLmods/Qwen3-VL-8B-Instruct-Unredacted-MAX-GGUF