DF-SSM Mamba-2 1.3B (278 MB)

1-bit scaffold + int8 LoRA distillation of Mamba-2 1.3B.

Size: 278 MB (9.7× compression from 2,688 MB)
Speedup: 21.4× GPU (batch=1), 1.8× CPU
Paper:
Code: https://github.com/cs-cmyk/df-ssm
DOI: https://doi.org/10.5281/zenodo.19501057

Files

File	Description
`dfssm_dfw.pt`	Frozen 1-bit scaffold (155 MB)
`dfssm_dfw_lora.pt`	Int8 LoRA correction (12 MB)

Usage

# See https://github.com/cs-cmyk/df-ssm for full inference code
python inference/gpu/df_ssm_inference.py \
    --scaffold dfssm_dfw.pt \
    --lora dfssm_dfw_lora.pt

Downstream Performance

Task	DF-SSM (32M tok, 1 GPU, 6h)	FP16 Teacher (300B+ tok)	Retention	BitMamba-2 (150B tok, multi-GPU)
BoolQ	60.8%	64.2%	94.7%	62.4%
PIQA	67.1%	73.2%	91.7%	68.8%
HellaSwag	41.4%	59.9%	69.1%	45.6%
WinoGrande	54.7%	60.9%	89.8%	52.8%
ARC-easy	50.2%	64.1%	78.3%	—

DF-SSM distillation cost excludes the pretrained FP16 teacher (300B+ tokens).

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support