BabyLM 2026 Strict-Small Accuracy-Morph
This is the final Accuracy-Morph seed-2 checkpoint from the BabyLM 2026 Strict-Small masking experiments.
The model is a compact BERT-style masked language model trained from scratch under a controlled masking-only setup. The experiment keeps the dataset, tokenizer, architecture, optimizer, schedule, training length, and total 15% MLM mask rate fixed, changing only how masked positions are selected.
Intended Use
This checkpoint is intended for BabyLM 2026 shared-task evaluation and for reproducing the accompanying controlled masking study. It is not intended as a general-purpose language model.
Training Setup
- Track: BabyLM 2026 Strict-Small
- Training data:
BabyLM-community/BabyLM-2026-Strict-Small - Tokenizer:
bert-base-uncased - Architecture: compact BERT masked language model
- Hidden size: 256
- Layers: 4
- Attention heads: 4
- Intermediate size: 1024
- Parameters: approximately 11.2M
- Maximum sequence length: 128
- Batch size: 64
- Training steps: 10,000
- Learning rate: 5e-4
- Weight decay: 0.01
- Warmup fraction: 0.06
- Total MLM mask rate: 15%
- Seed: 2
Masking Policy
Accuracy-Morph uses smoothed top-1 prediction correctness as the online difficulty signal. During warmup, masking is random. After warmup, the fixed 15% MLM mask budget is allocated as:
- 80% random masking
- 15% token-level correctness-guided masking
- 5% character-trigram correctness-guided masking
Sketches count masked-token exposures and wrong top-1 predictions. Candidate difficulty is estimated from smoothed error rates rather than raw cross-entropy loss.
Local Validation Result
The final local validation MLM loss for this seed-2 checkpoint was:
3.0105
In the matched seed-2 comparison, the random MLM baseline validation loss was 3.0296.
Files
model.safetensors: model weightsconfig.json: model configurationtokenizer.json,tokenizer_config.json: tokenizer filestraining_config.yaml: training and masking configuration used for the runmetadata.json: run metadata
Caveats
This is a research checkpoint from a small-data controlled experiment. The associated paper frames the result cautiously: Accuracy-Morph is the most promising tested masking signal, but it is not a settled robust improvement without broader multi-seed confirmation.
- Downloads last month
- -