Instructions to use adam-fleet/Qwen3-Next-80B-A3B-Instruct-REAM-mlx-3bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use adam-fleet/Qwen3-Next-80B-A3B-Instruct-REAM-mlx-3bit with MLX:

# Make sure mlx-lm is installed
# pip install --upgrade mlx-lm

# Generate text with mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("adam-fleet/Qwen3-Next-80B-A3B-Instruct-REAM-mlx-3bit")

prompt = "Write a story about Einstein"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True
)

text = generate(model, tokenizer, prompt=prompt, verbose=True)

Notebooks
Google Colab
Kaggle
Local Apps
LM Studio

Pi new

How to use adam-fleet/Qwen3-Next-80B-A3B-Instruct-REAM-mlx-3bit with Pi:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "adam-fleet/Qwen3-Next-80B-A3B-Instruct-REAM-mlx-3bit"

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "mlx-lm": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "adam-fleet/Qwen3-Next-80B-A3B-Instruct-REAM-mlx-3bit"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use adam-fleet/Qwen3-Next-80B-A3B-Instruct-REAM-mlx-3bit with Hermes Agent:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "adam-fleet/Qwen3-Next-80B-A3B-Instruct-REAM-mlx-3bit"

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default adam-fleet/Qwen3-Next-80B-A3B-Instruct-REAM-mlx-3bit

Run Hermes

hermes

MLX LM

How to use adam-fleet/Qwen3-Next-80B-A3B-Instruct-REAM-mlx-3bit with MLX LM:

Generate or start a chat session

# Install MLX LM
uv tool install mlx-lm
# Interactive chat REPL
mlx_lm.chat --model "adam-fleet/Qwen3-Next-80B-A3B-Instruct-REAM-mlx-3bit"

Run an OpenAI-compatible server

# Install MLX LM
uv tool install mlx-lm
# Start the server
mlx_lm.server --model "adam-fleet/Qwen3-Next-80B-A3B-Instruct-REAM-mlx-3bit"
# Calling the OpenAI-compatible server with curl
curl -X POST "http://localhost:8000/v1/chat/completions" \
   -H "Content-Type: application/json" \
   --data '{
     "model": "adam-fleet/Qwen3-Next-80B-A3B-Instruct-REAM-mlx-3bit",
     "messages": [
       {"role": "user", "content": "Hello"}
     ]
   }'

Qwen3-Next-80B-A3B-Instruct-REAMv2 — MLX 3-bit

MLX 3-bit quantization of bknyaz/Qwen3-Next-80B-A3B-Instruct-REAM (REAMv2) for Apple Silicon Macs.

This is the general-purpose instruct sibling of TomLucidor/Qwen3-Coder-Next-REAM-mlx-3Bit (coding variant).

Model Summary

Property	Value
Base model	Qwen/Qwen3-Next-80B-A3B-Instruct
REAM compression	bknyaz/Qwen3-Next-80B-A3B-Instruct-REAM (REAMv2, 512→384 experts, 80B→60B params)
MLX quantization	3-bit with `mixed_3_6` preset (3-bit bulk, 6-bit for sensitive layers)
Average bits per weight	3.998
Size on disk	28 GB
Total parameters	60B
Active parameters per token	3B
Architecture	Hybrid attention (Gated DeltaNet + Gated Attention) with ultra-sparse MoE
Context length	262,144 tokens (native)
Target hardware	Apple Silicon Macs with 48GB+ unified memory

Compression Pipeline

Qwen3-Next-80B-A3B-Instruct (80B, 160GB bf16)
  → REAMv2 expert merging (60B, 120GB bf16) — by bknyaz
    → MLX 3-bit mixed_3_6 quantization (60B, 28GB) — this model

REAMv2 details (from the source model card): The v2 compression used C=32 expert grouping, calibration data weighted 70% math / 30% code / 0% C4, and preserves the MTP (Multi-Token Prediction) layer.

Quantization Details

Converted with mlx-lm v0.31.x using:

mlx_lm.convert \
  --hf-path bknyaz/Qwen3-Next-80B-A3B-Instruct-REAM \
  --mlx-path ~/Qwen3-Next-REAM-Instruct-mlx-3bit \
  -q \
  --q-bits 3 \
  --q-group-size 64 \
  --quant-predicate mixed_3_6

The mixed_3_6 preset quantizes most layers to 3-bit while keeping sensitive layers (MoE down projections, select attention V projections, and the LM head) at 6-bit for better quality.

Benchmark Results

Evaluated using lm-evaluation-harness via local-chat-completions against mlx_lm.server. Generation parameters: temperature=0.0, do_sample=False, batch_size=1.

Benchmark	This model (MLX 3-bit)	REAMv2 60B (bf16)	Original 80B (bf16)
GSM8K (0-shot, flexible-extract)	67.4	—	—
GSM8K (5-shot, flexible-extract)	84.6	78.1	78.6
IFEval (prompt-level strict)	82.8	—	—
IFEval (prompt-level loose)	88.5	—	—
IFEval (inst-level strict)	88.1	—	—
IFEval (inst-level loose)	92.1	—	—

REAMv2 and Original 80B scores are from the bknyaz model card. The model card reports IFEval as 93.4 for both REAMv2 and the original but does not specify which metric variant.

Note on GSM8K: Our 5-shot score (84.6) is higher than the model card's bf16 score (78.1). This is almost certainly due to differences in evaluation methodology (prompt template, generation parameters, lm-eval version), not the quantization improving quality. Treat these as reference points from our specific eval setup, not direct comparisons.

Usage

With mlx-lm

pip install mlx-lm
mlx_lm.chat --model adam-fleet/Qwen3-Next-80B-A3B-Instruct-REAM-mlx-3bit

As an OpenAI-compatible server

mlx_lm.server --model adam-fleet/Qwen3-Next-80B-A3B-Instruct-REAM-mlx-3bit --port 8080

With LM Studio

Load as a local model in LM Studio using the MLX backend. Point to the downloaded model directory.

Memory Requirements

With 28GB for weights, this model needs approximately 48GB unified memory to run comfortably with moderate context lengths. On a 48GB Apple Silicon Mac (M4 Pro, M4 Max, M5 Max, etc.), expect ~16-20GB available for KV cache and OS overhead.

Important Context

Qwen3-Next-80B-A3B was released in September 2025 as an experimental architecture preview. The current-generation model in this parameter class is Qwen3.5-35B-A3B (February 2026), which incorporates the same hybrid attention innovations with improved training, vision support, and overall better benchmarks. For most users, Qwen3.5-35B-A3B at 4-bit (~22GB) will be the better daily driver. This model is provided for users interested in the Qwen3-Next architecture specifically, or who want the larger 60B parameter count in a compact MLX format.

License

Apache 2.0 — same as the original Qwen/Qwen3-Next-80B-A3B-Instruct.

Downloads last month: 131

Safetensors

Model size

60B params

Tensor type

BF16

U32

MLX

Hardware compatibility

3-bit

Model tree for adam-fleet/Qwen3-Next-80B-A3B-Instruct-REAM-mlx-3bit

Base model

Qwen/Qwen3-Next-80B-A3B-Instruct

Finetuned

bknyaz/Qwen3-Next-80B-A3B-Instruct-REAM

Quantized

(2)

this model