Instructions to use dealignai/Ling-2.6-flash-MXFP4-CRACK with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use dealignai/Ling-2.6-flash-MXFP4-CRACK with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("dealignai/Ling-2.6-flash-MXFP4-CRACK") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
- Pi new
How to use dealignai/Ling-2.6-flash-MXFP4-CRACK with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "dealignai/Ling-2.6-flash-MXFP4-CRACK"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "dealignai/Ling-2.6-flash-MXFP4-CRACK" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use dealignai/Ling-2.6-flash-MXFP4-CRACK with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "dealignai/Ling-2.6-flash-MXFP4-CRACK"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default dealignai/Ling-2.6-flash-MXFP4-CRACK
Run Hermes
hermes
- MLX LM
How to use dealignai/Ling-2.6-flash-MXFP4-CRACK with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "dealignai/Ling-2.6-flash-MXFP4-CRACK"
Run an OpenAI-compatible server
# Install MLX LM uv tool install mlx-lm # Start the server mlx_lm.server --model "dealignai/Ling-2.6-flash-MXFP4-CRACK" # Calling the OpenAI-compatible server with curl curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "dealignai/Ling-2.6-flash-MXFP4-CRACK", "messages": [ {"role": "user", "content": "Hello"} ] }'
Ling 2.6 Flash — MXFP4 + CRACK
MXFP4 quantized | CRACK abliterated | Hybrid MLA + Linear-Attn MoE | EN + ZH | 63 GB
What Is This?
This is Ling 2.6 Flash by inclusionAI — a 35B-parameter Mixture-of-Experts model with 256 routed experts (8 active per token) + 1 always-active shared expert, hybrid MLA + Lightning Linear-Attention architecture, native English + Chinese, 131K context.
It has been:
- MXFP4 quantized — uniform 4-bit affine, group_size=32 — 63 GB
- CRACK abliterated — permanent weight-level removal of safety refusal
| Base model | inclusionAI/Ling-2.6-flash (35B total, 1 shared + 8 routed active) |
| Architecture | bailing_hybrid — Multi-Latent Attention (MLA) every 8th layer + Lightning Linear-Attn elsewhere |
| Quantization | MXFP4 (Q4 g=32 affine) — 63 GB |
| MMLU-200 | 78.5% (base 80.0% — within −1.5pp) |
| HarmBench-320 | 97.8% comply (base 50.3% — +47.5pp) |
| Context | 131,072 native |
| Languages | English + Chinese (probed bilingual) |
| Speed | 30+ tok/s on M4 Max 128 GB |
| Fits on | 96 GB+ Macs |
MMLU-200 Results (thinking OFF)
| Model | Correct | Accuracy | No-match |
|---|---|---|---|
| MXFP4 Base | 160/200 | 80.00% | 6 |
| MXFP4 + CRACK | 157/200 | 78.50% | 10 |
| Δ | −3 | −1.5pp | +4 |
CRACK delta of −1.5pp is well inside the noise floor for a 200-question sample — capability essentially preserved.
HarmBench-320 Results
| Model | COMPLY | REFUSE | EMPTY |
|---|---|---|---|
| MXFP4 Base | 161 (50.3%) | 157 (49.1%) | 2 (0.6%) |
| MXFP4 + CRACK | 313 (97.8%) | 5 (1.6%) | 2 (0.6%) |
| Δ comply | +47.5pp |
Refusal directional removal lifts HarmBench compliance from ~50% to ~98% with negligible MMLU regression.
Ling 2.6 Flash CRACK Series
| Model | Format | Size | MMLU-200 | HarmBench-320 | Fits on |
|---|---|---|---|---|---|
| MXFP4 + CRACK (this model) | affine 4-bit g=32 | 63 GB | 78.5% | 97.8% | 96 GB Mac |
| JANGTQ2 + CRACK | TurboQuant 2-bit experts + 8-bit affine | 29 GB | 81.0% | 100.0% | 48 GB Mac |
The JANGTQ2 variant is smaller and scores higher on both benchmarks — quant noise on the 2-bit routed experts ends up helping rather than hurting.
Usage
from mlx_lm import load, generate
model, tokenizer = load("dealignai/Ling-2.6-flash-MXFP4-CRACK")
messages = [{"role": "user", "content": "Hello — what can you do?"}]
prompt = tokenizer.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
print(generate(model, tokenizer, prompt=prompt, max_tokens=400, verbose=True))
mlx_lm >= 0.20 with bailing_hybrid model class is required.
About This Model
Ling 2.6 Flash is the latency-tier sibling in the Ling 2.6 family — fast multilingual instruction-follow + tool use. The chat template includes a <think>...</think> reasoning block, but in practice this Flash variant is best treated as a non-reasoning instruct model: leave thinking OFF (the default) for benchmark-style work and short-form responses. For chain-of-thought reasoning prefer the larger Ling 2.6 Plus / Ring / Pro tier.
CRACK is a permanent weight-level abliteration that removes safety refusal from the always-active residual-stream writers. Multilingual (EN + ZH) refusal direction extraction means the model complies on both English and Chinese prompts. The vision tower (none here — Ling 2.6 Flash is text-only) and MoE routing/expert internals are untouched.
Support dealignai
All models are built from original research and published for free.
Support us on Ko-fi — check out the Ko-fi membership for early access and extras.
Disclaimer
This model has had its safety refusal circuits removed. It will produce responses that would normally be refused, including technical content on security testing, dual-use research, and sensitive topics. You are responsible for how you use it.
The CRACK abliteration process does not add new capabilities — it only removes the model's learned refusal patterns. All knowledge, including the knowledge used to produce unsafe outputs, was already present in the base Ling model.
- Downloads last month
- 388
Quantized
Model tree for dealignai/Ling-2.6-flash-MXFP4-CRACK
Base model
inclusionAI/Ling-2.6-flash