BabyLM 2026 Strict-Small Accuracy-Morph

This is the final Accuracy-Morph seed-2 checkpoint from the BabyLM 2026 Strict-Small masking experiments.

The model is a compact BERT-style masked language model trained from scratch under a controlled masking-only setup. The experiment keeps the dataset, tokenizer, architecture, optimizer, schedule, training length, and total 15% MLM mask rate fixed, changing only how masked positions are selected.

Intended Use

This checkpoint is intended for BabyLM 2026 shared-task evaluation and for reproducing the accompanying controlled masking study. It is not intended as a general-purpose language model.

Training Setup

  • Track: BabyLM 2026 Strict-Small
  • Training data: BabyLM-community/BabyLM-2026-Strict-Small
  • Tokenizer: bert-base-uncased
  • Architecture: compact BERT masked language model
  • Hidden size: 256
  • Layers: 4
  • Attention heads: 4
  • Intermediate size: 1024
  • Parameters: approximately 11.2M
  • Maximum sequence length: 128
  • Batch size: 64
  • Training steps: 10,000
  • Learning rate: 5e-4
  • Weight decay: 0.01
  • Warmup fraction: 0.06
  • Total MLM mask rate: 15%
  • Seed: 2

Masking Policy

Accuracy-Morph uses smoothed top-1 prediction correctness as the online difficulty signal. During warmup, masking is random. After warmup, the fixed 15% MLM mask budget is allocated as:

  • 80% random masking
  • 15% token-level correctness-guided masking
  • 5% character-trigram correctness-guided masking

Sketches count masked-token exposures and wrong top-1 predictions. Candidate difficulty is estimated from smoothed error rates rather than raw cross-entropy loss.

Local Validation Result

The final local validation MLM loss for this seed-2 checkpoint was:

3.0105

In the matched seed-2 comparison, the random MLM baseline validation loss was 3.0296.

Files

  • model.safetensors: model weights
  • config.json: model configuration
  • tokenizer.json, tokenizer_config.json: tokenizer files
  • training_config.yaml: training and masking configuration used for the run
  • metadata.json: run metadata

Caveats

This is a research checkpoint from a small-data controlled experiment. The associated paper frames the result cautiously: Accuracy-Morph is the most promising tested masking signal, but it is not a settled robust improvement without broader multi-seed confirmation.

Downloads last month
-
Safetensors
Model size
11.2M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support