Instructions to use adam-fleet/Qwen3-Next-80B-A3B-Instruct-REAM-mlx-3bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use adam-fleet/Qwen3-Next-80B-A3B-Instruct-REAM-mlx-3bit with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("adam-fleet/Qwen3-Next-80B-A3B-Instruct-REAM-mlx-3bit") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
- Pi new
How to use adam-fleet/Qwen3-Next-80B-A3B-Instruct-REAM-mlx-3bit with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "adam-fleet/Qwen3-Next-80B-A3B-Instruct-REAM-mlx-3bit"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "adam-fleet/Qwen3-Next-80B-A3B-Instruct-REAM-mlx-3bit" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use adam-fleet/Qwen3-Next-80B-A3B-Instruct-REAM-mlx-3bit with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "adam-fleet/Qwen3-Next-80B-A3B-Instruct-REAM-mlx-3bit"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default adam-fleet/Qwen3-Next-80B-A3B-Instruct-REAM-mlx-3bit
Run Hermes
hermes
- MLX LM
How to use adam-fleet/Qwen3-Next-80B-A3B-Instruct-REAM-mlx-3bit with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "adam-fleet/Qwen3-Next-80B-A3B-Instruct-REAM-mlx-3bit"
Run an OpenAI-compatible server
# Install MLX LM uv tool install mlx-lm # Start the server mlx_lm.server --model "adam-fleet/Qwen3-Next-80B-A3B-Instruct-REAM-mlx-3bit" # Calling the OpenAI-compatible server with curl curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "adam-fleet/Qwen3-Next-80B-A3B-Instruct-REAM-mlx-3bit", "messages": [ {"role": "user", "content": "Hello"} ] }'
Qwen3-Next-80B-A3B-Instruct-REAMv2 — MLX 3-bit
MLX 3-bit quantization of bknyaz/Qwen3-Next-80B-A3B-Instruct-REAM (REAMv2) for Apple Silicon Macs.
This is the general-purpose instruct sibling of TomLucidor/Qwen3-Coder-Next-REAM-mlx-3Bit (coding variant).
Model Summary
| Property | Value |
|---|---|
| Base model | Qwen/Qwen3-Next-80B-A3B-Instruct |
| REAM compression | bknyaz/Qwen3-Next-80B-A3B-Instruct-REAM (REAMv2, 512→384 experts, 80B→60B params) |
| MLX quantization | 3-bit with mixed_3_6 preset (3-bit bulk, 6-bit for sensitive layers) |
| Average bits per weight | 3.998 |
| Size on disk | 28 GB |
| Total parameters | 60B |
| Active parameters per token | 3B |
| Architecture | Hybrid attention (Gated DeltaNet + Gated Attention) with ultra-sparse MoE |
| Context length | 262,144 tokens (native) |
| Target hardware | Apple Silicon Macs with 48GB+ unified memory |
Compression Pipeline
Qwen3-Next-80B-A3B-Instruct (80B, 160GB bf16)
→ REAMv2 expert merging (60B, 120GB bf16) — by bknyaz
→ MLX 3-bit mixed_3_6 quantization (60B, 28GB) — this model
REAMv2 details (from the source model card): The v2 compression used C=32 expert grouping, calibration data weighted 70% math / 30% code / 0% C4, and preserves the MTP (Multi-Token Prediction) layer.
Quantization Details
Converted with mlx-lm v0.31.x using:
mlx_lm.convert \
--hf-path bknyaz/Qwen3-Next-80B-A3B-Instruct-REAM \
--mlx-path ~/Qwen3-Next-REAM-Instruct-mlx-3bit \
-q \
--q-bits 3 \
--q-group-size 64 \
--quant-predicate mixed_3_6
The mixed_3_6 preset quantizes most layers to 3-bit while keeping sensitive layers (MoE down projections, select attention V projections, and the LM head) at 6-bit for better quality.
Benchmark Results
Evaluated using lm-evaluation-harness via local-chat-completions against mlx_lm.server. Generation parameters: temperature=0.0, do_sample=False, batch_size=1.
| Benchmark | This model (MLX 3-bit) | REAMv2 60B (bf16) | Original 80B (bf16) |
|---|---|---|---|
| GSM8K (0-shot, flexible-extract) | 67.4 | — | — |
| GSM8K (5-shot, flexible-extract) | 84.6 | 78.1 | 78.6 |
| IFEval (prompt-level strict) | 82.8 | — | — |
| IFEval (prompt-level loose) | 88.5 | — | — |
| IFEval (inst-level strict) | 88.1 | — | — |
| IFEval (inst-level loose) | 92.1 | — | — |
REAMv2 and Original 80B scores are from the bknyaz model card. The model card reports IFEval as 93.4 for both REAMv2 and the original but does not specify which metric variant.
Note on GSM8K: Our 5-shot score (84.6) is higher than the model card's bf16 score (78.1). This is almost certainly due to differences in evaluation methodology (prompt template, generation parameters, lm-eval version), not the quantization improving quality. Treat these as reference points from our specific eval setup, not direct comparisons.
Usage
With mlx-lm
pip install mlx-lm
mlx_lm.chat --model adam-fleet/Qwen3-Next-80B-A3B-Instruct-REAM-mlx-3bit
As an OpenAI-compatible server
mlx_lm.server --model adam-fleet/Qwen3-Next-80B-A3B-Instruct-REAM-mlx-3bit --port 8080
With LM Studio
Load as a local model in LM Studio using the MLX backend. Point to the downloaded model directory.
Memory Requirements
With 28GB for weights, this model needs approximately 48GB unified memory to run comfortably with moderate context lengths. On a 48GB Apple Silicon Mac (M4 Pro, M4 Max, M5 Max, etc.), expect ~16-20GB available for KV cache and OS overhead.
Important Context
Qwen3-Next-80B-A3B was released in September 2025 as an experimental architecture preview. The current-generation model in this parameter class is Qwen3.5-35B-A3B (February 2026), which incorporates the same hybrid attention innovations with improved training, vision support, and overall better benchmarks. For most users, Qwen3.5-35B-A3B at 4-bit (~22GB) will be the better daily driver. This model is provided for users interested in the Qwen3-Next architecture specifically, or who want the larger 60B parameter count in a compact MLX format.
License
Apache 2.0 — same as the original Qwen/Qwen3-Next-80B-A3B-Instruct.
- Downloads last month
- 131
3-bit
Model tree for adam-fleet/Qwen3-Next-80B-A3B-Instruct-REAM-mlx-3bit
Base model
Qwen/Qwen3-Next-80B-A3B-Instruct