DeepSeek-R1-Distill-Qwen-7B — GGUF Quants

Quantized GGUF versions of deepseek-ai/DeepSeek-R1-Distill-Qwen-7B — a 7B reasoning model distilled from DeepSeek-R1 into a Qwen2.5 backbone. Brings chain-of-thought reasoning from a 671B MoE teacher into a compact 7B package, achieving state-of-the-art math and coding results at this parameter scale.

Available Files

File	Quant	Size	Use Case
`DeepSeek-R1-Distill-Qwen-7B-Q8_0.gguf`	Q8_0	~7.7GB	Maximum quality
`DeepSeek-R1-Distill-Qwen-7B-Q6_K.gguf`	Q6_K	~6.0GB	Near-lossless
`DeepSeek-R1-Distill-Qwen-7B-Q5_K_M.gguf`	Q5_K_M	~5.2GB	High quality
`DeepSeek-R1-Distill-Qwen-7B-Q4_K_M.gguf`	Q4_K_M	~4.4GB	Recommended default
`DeepSeek-R1-Distill-Qwen-7B-Q3_K_M.gguf`	Q3_K_M	~3.5GB	Low VRAM
`DeepSeek-R1-Distill-Qwen-7B-IQ4_XS.gguf`	IQ4_XS	~3.9GB	Imatrix 4-bit
`DeepSeek-R1-Distill-Qwen-7B-IQ3_XXS.gguf`	IQ3_XXS	~2.9GB	Imatrix 3-bit
`DeepSeek-R1-Distill-Qwen-7B-IQ2_M.gguf`	IQ2_M	~2.5GB	Imatrix 2-bit
`DeepSeek-R1-Distill-Qwen-7B-IQ1_S.gguf`	IQ1_S	~1.8GB	Extreme compression
`DeepSeek-R1-Distill-Qwen-7B-fp16.gguf`	FP16	~14.8GB	Full precision
`imatrix.dat`	—	—	Importance matrix

Usage

./llama-cli -m DeepSeek-R1-Distill-Qwen-7B-Q4_K_M.gguf \
  --ctx-size 8192 -n 2048 \
  -p "<|im_start|>user\nSolve step by step: what is 15% of 240?<|im_end|>\n<|im_start|>assistant\n"

Let the model think — the reasoning traces inside <think> blocks are where the magic happens. Give it -n 2048 or more for complex problems.

About DeepSeek-R1-Distill-Qwen-7B

Parameters: 7B (Qwen2.5 backbone)
Teacher: DeepSeek-R1 (671B MoE)
Specialization: Mathematical reasoning, code generation, chain-of-thought
License: MIT

One of the best reasoning-capable 7B models available. Trained with GRPO and distilled from a frontier-class reasoner.

Quantized by DuoNeural using llama.cpp on RTX 5090.

DuoNeural

DuoNeural is an open AI research lab — human + AI in collaboration.

Platform	Link
HuggingFace	huggingface.co/DuoNeural
Website	duoneural.com
GitHub	github.com/DuoNeural
X / Twitter	@DuoNeural
Email	duoneural@proton.me
Newsletter	duoneural.beehiiv.com
Support	buymeacoffee.com/duoneural

DuoNeural Research Publications

Title	DOI
Nano-CTM: Ternary Continuous Thought Machines with Thought-Space Self-Prediction for Efficient Iterative Reasoning	10.5281/zenodo.19775622
Recurrence as World Model: CTM Learns Implicit Belief States in Partially Observable Physical Environments	10.5281/zenodo.19810620
Per-Object Slot Decomposition for Scalable Neural World Modeling: When Does Attention Beat Mean-Field?	10.5281/zenodo.19846804
The Dynamical Horizon Principle: CTM Gates Converge to the Predictability Limit of Dynamical Systems	10.5281/zenodo.19952612

Open access, CC BY 4.0. Authored by Archon, Jesse Caldwell, Aura — DuoNeural.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for DuoNeural/DeepSeek-R1-Distill-Qwen-7B-GGUF

Base model

deepseek-ai/DeepSeek-R1-Distill-Qwen-7B

Finetuned

(297)

this model