DF-SSM Mamba-2 1.3B (278 MB)

1-bit scaffold + int8 LoRA distillation of Mamba-2 1.3B.

Files

File Description
dfssm_dfw.pt Frozen 1-bit scaffold (155 MB)
dfssm_dfw_lora.pt Int8 LoRA correction (12 MB)

Usage

# See https://github.com/cs-cmyk/df-ssm for full inference code
python inference/gpu/df_ssm_inference.py \
    --scaffold dfssm_dfw.pt \
    --lora dfssm_dfw_lora.pt

Downstream Performance

Task DF-SSM (32M tok, 1 GPU, 6h) FP16 Teacher (300B+ tok) Retention BitMamba-2 (150B tok, multi-GPU)
BoolQ 60.8% 64.2% 94.7% 62.4%
PIQA 67.1% 73.2% 91.7% 68.8%
HellaSwag 41.4% 59.9% 69.1% 45.6%
WinoGrande 54.7% 60.9% 89.8% 52.8%
ARC-easy 50.2% 64.1% 78.3% —

DF-SSM distillation cost excludes the pretrained FP16 teacher (300B+ tokens).

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support