BabyLM 2026 Strict-Small Accuracy-Morph

This is the final Accuracy-Morph seed-2 checkpoint from the BabyLM 2026 Strict-Small masking experiments.

The model is a compact BERT-style masked language model trained from scratch under a controlled masking-only setup. The experiment keeps the dataset, tokenizer, architecture, optimizer, schedule, training length, and total 15% MLM mask rate fixed, changing only how masked positions are selected.

Intended Use

This checkpoint is intended for BabyLM 2026 shared-task evaluation and for reproducing the accompanying controlled masking study. It is not intended as a general-purpose language model.

Training Setup

Track: BabyLM 2026 Strict-Small
Training data: BabyLM-community/BabyLM-2026-Strict-Small
Tokenizer: bert-base-uncased
Architecture: compact BERT masked language model
Hidden size: 256
Layers: 4
Attention heads: 4
Intermediate size: 1024
Parameters: approximately 11.2M
Maximum sequence length: 128
Batch size: 64
Training steps: 10,000
Learning rate: 5e-4
Weight decay: 0.01
Warmup fraction: 0.06
Total MLM mask rate: 15%
Seed: 2

Masking Policy

Accuracy-Morph uses smoothed top-1 prediction correctness as the online difficulty signal. During warmup, masking is random. After warmup, the fixed 15% MLM mask budget is allocated as:

80% random masking
15% token-level correctness-guided masking
5% character-trigram correctness-guided masking

Sketches count masked-token exposures and wrong top-1 predictions. Candidate difficulty is estimated from smoothed error rates rather than raw cross-entropy loss.

Local Validation Result

The final local validation MLM loss for this seed-2 checkpoint was:

3.0105

In the matched seed-2 comparison, the random MLM baseline validation loss was 3.0296.

Files

model.safetensors: model weights
config.json: model configuration
tokenizer.json, tokenizer_config.json: tokenizer files
training_config.yaml: training and masking configuration used for the run
metadata.json: run metadata

Caveats

This is a research checkpoint from a small-data controlled experiment. The associated paper frames the result cautiously: Accuracy-Morph is the most promising tested masking signal, but it is not a settled robust improvement without broader multi-seed confirmation.

Downloads last month: -

Safetensors

Model size

11.2M params

Tensor type

F32