DF-SSM Mamba-2 1.3B (278 MB)
1-bit scaffold + int8 LoRA distillation of Mamba-2 1.3B.
- Size: 278 MB (9.7× compression from 2,688 MB)
- Speedup: 21.4× GPU (batch=1), 1.8× CPU
- Paper:
- Code: https://github.com/cs-cmyk/df-ssm
- DOI: https://doi.org/10.5281/zenodo.19501057
Files
| File | Description |
|---|---|
dfssm_dfw.pt |
Frozen 1-bit scaffold (155 MB) |
dfssm_dfw_lora.pt |
Int8 LoRA correction (12 MB) |
Usage
# See https://github.com/cs-cmyk/df-ssm for full inference code
python inference/gpu/df_ssm_inference.py \
--scaffold dfssm_dfw.pt \
--lora dfssm_dfw_lora.pt
Downstream Performance
| Task | DF-SSM (32M tok, 1 GPU, 6h) | FP16 Teacher (300B+ tok) | Retention | BitMamba-2 (150B tok, multi-GPU) |
|---|---|---|---|---|
| BoolQ | 60.8% | 64.2% | 94.7% | 62.4% |
| PIQA | 67.1% | 73.2% | 91.7% | 68.8% |
| HellaSwag | 41.4% | 59.9% | 69.1% | 45.6% |
| WinoGrande | 54.7% | 60.9% | 89.8% | 52.8% |
| ARC-easy | 50.2% | 64.1% | 78.3% | — |
DF-SSM distillation cost excludes the pretrained FP16 teacher (300B+ tokens).
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support