Model created for the paper "Preferences for Idiomatic Language are Acquired Slowly --- and Forgotten Quickly: A Case Study on Swedish", TACL 2026.

Citation

@misc{kunz2026preferencesidiomaticlanguageacquired,
      title={Preferences for Idiomatic Language are Acquired Slowly -- and Forgotten Quickly: A Case Study on Swedish}, 
      author={Jenny Kunz},
      year={2026},
      eprint={2602.03484},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2602.03484}, 
}

Training:

This is a SmolLM2-135M model continually pre-trained on the Swedish portion of Fineweb-2 and then instruction-tuned on Smol-Smoltalk machine-translated to Swedish with Gemma3-27B.

  • 1 Epoch
  • Learning rate: 5e-4
  • LR scheduler: Cosine
  • Warmup ratio: 0.05
  • Batch size: 1
  • 4 A100 (40GB) GPUs
  • Gradient accumulation steps: 64
  • Effective batch size: 256
  • Max. context length: 8192 tokens

Limitations

This is a research model intended for studying pre-training dynamics and I do not recommend using it for any practical purposes. It is trained on a web corpus, and no alignment whatsoever has been performed, which means that the model will likely reflect its training data's biases and produce lots of hallucinations.

Downloads last month
10
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jekunz/smollm-135m-cpt-fineweb-swedish-smol-smoltalk

Dataset used to train jekunz/smollm-135m-cpt-fineweb-swedish-smol-smoltalk

Collection including jekunz/smollm-135m-cpt-fineweb-swedish-smol-smoltalk

Paper for jekunz/smollm-135m-cpt-fineweb-swedish-smol-smoltalk