thoughtworks
/

MiniMax-M2.5-Eagle3

Text Generation

speculative-decoding

Mixture of Experts

mixture-of-experts

text-generation-inference

Model card Files Files and versions

lujangusface commited on 15 days ago

Commit

29cdd29

·

verified ·

1 Parent(s): 2fcf070

Upload README.md with huggingface_hub

Files changed (1) hide show

README.md +15 -2

README.md CHANGED Viewed

@@ -115,7 +115,9 @@ EAGLE3 trains a single-layer draft head that predicts the next token using hidde
 *The released model was fine-tuned for 6 additional epochs on 20K regenerated samples from the target model. The fine-tuned accuracy is expected to be equal or higher than these base values.*
-### Inference Benchmarks (B=1, temp=0, FP8, TP=4)
 | Dataset | Baseline (tok/s) | EAGLE3 (tok/s) | Speedup |
 |---------|-----------------|----------------|---------|
@@ -124,7 +126,18 @@ EAGLE3 trains a single-layer draft head that predicts the next token using hidde
 | SWEBench-Verified | 109.6 | 191.8 | **1.75x** |
 | Aider | 109.9 | 186.8 | **1.70x** |
-*Config: steps=3, topk=4, draft_tokens=8. All datasets at temp=0 on 8x H200 (TP=4).*
 ## Model Architecture

 *The released model was fine-tuned for 6 additional epochs on 20K regenerated samples from the target model. The fine-tuned accuracy is expected to be equal or higher than these base values.*
+### Inference Benchmarks (B=1, temp=0, TP=4)
+**With draft_tokens=8 (best B=1 config)**:
 | Dataset | Baseline (tok/s) | EAGLE3 (tok/s) | Speedup |
 |---------|-----------------|----------------|---------|
 | SWEBench-Verified | 109.6 | 191.8 | **1.75x** |
 | Aider | 109.9 | 186.8 | **1.70x** |
+*Config: steps=3, topk=4, draft_tokens=8. 8x H200 (TP=4).*
+**With draft_tokens=6 (standard config, verified 2026-04-12)**:
+| Dataset | Baseline (tok/s) | EAGLE3 (tok/s) | Speedup |
+|---------|-----------------|----------------|---------|
+| HumanEval | 109.6 | 158.0 | **1.44x** |
+| Terminal-Bench | 108.9 | 150.2 | **1.38x** |
+| MT-Bench | 109.0 | 143.6 | **1.32x** |
+| SWEBench-Verified | 109.1 | 116.5 | **1.07x** |
+*Config: steps=3, topk=4, draft_tokens=6. 4x H200 (TP=4). Server-side Prometheus metrics.*
 ## Model Architecture