yat-pn+ca 261M (d=12) โ€” seed 2

Reproducibility seed for the yat-pn+ca 261M ablation (seed 0 is the canonical published checkpoint). Same architecture, same data, same hyper-params โ€” only the random seed differs. Useful for variance estimation when comparing architectures.

from transformers import AutoModelForCausalLM, AutoTokenizer
m = AutoModelForCausalLM.from_pretrained(
    "mlnomad/yatnmn-softplus-ca-d12-chinchilla-261M-seed2-pytorch",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-v0.1")

Apache 2.0.

Downloads last month
2,302
Safetensors
Model size
0.3B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Dataset used to train mlnomad/yatnmn-softplus-ca-d12-chinchilla-261M-seed2-pytorch