This is a MXFP4_MOE quantization of the model Mistral-Small-4-119B-2603

  • Download the latest llama.cpp to use it.
  • For the mmproj file, the F32 version is recommended for best results. F32 > BF16 > F16

The mainline standard is to use MXFP4 for the MoE tensors, and Q8 for the rest.
So I created a new variant, where the other tensors are BF16 instead of Q8.
On some architectures BF16 will be slower, but its the highest quality, essentialy its the original tensors from the model copied over unquantized.
This model here has been quantized like this.

Downloads last month
525
GGUF
Model size
119B params
Architecture
mistral4
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for noctrex/Mistral-Small-4-119B-2603-MXFP4_MOE-GGUF

Quantized
(36)
this model