Qwen3.5-122B-A10B-NVFP4

This is a quantized version of Qwen/Qwen3.5-122B-A10B using the NVFP4 quantization scheme.

Please use nightly vLLM for support.

Changelog

  • 02/03/2026: Added MTP (multi-token prediction) weights from source checkpoint, enabling speculative decoding with vLLM.
  • 25/02/2026: Initial upload.

Calibration

Creation

This model was created using VLLM's LLM Compressor with Qwen3.5 MoE support added via PR #2383. The PR adds a custom CalibrationQwen3MoeSparseMoeBlock that routes calibration data to all experts during quantization, ensuring every expert receives proper calibration for accurate NVFP4 quantization.

Downloads last month
78,960
Safetensors
Model size
71B params
Tensor type
F32
·
BF16
·
F8_E4M3
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Sehyo/Qwen3.5-122B-A10B-NVFP4

Quantized
(35)
this model