Llama-3.2-3B-Instruct-Mongolian

A LoRA fine-tuned version of meta-llama/Llama-3.2-3B-Instruct for Mongolian language instruction-following and chat.

Model Description

This model adapts Llama 3.2 3B Instruct to understand and generate fluent Mongolian text. It was fine-tuned using LoRA (Low-Rank Adaptation) on the saillab/alpaca-mongolian-cleaned dataset containing ~41,600 Mongolian instruction-following examples.

The base model struggles with Mongolian, producing garbled or incoherent text. After fine-tuning, the model generates fluent, coherent Mongolian responses across a wide range of topics.

Quick Start

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

BASE_MODEL = "meta-llama/Llama-3.2-3B-Instruct"
ADAPTER = "munkhbayar-batkhuu/Llama-3.2-3B-Instruct-Mongolian"

tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL)
model = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL, torch_dtype=torch.float16, device_map="auto"
)
model = PeftModel.from_pretrained(model, ADAPTER)
model.eval()

messages = [
    {"role": "system", "content": "You are a helpful assistant that responds in Mongolian."},
    {"role": "user", "content": "Монгол улсын нийслэл хаана байдаг вэ?"},
]

inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", return_dict=True).to(model.device)

with torch.no_grad():
    output = model.generate(**inputs, max_new_tokens=256, temperature=0.7, top_p=0.9, do_sample=True)

response = tokenizer.decode(output[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True)
print(response)
# Output: Монгол улсын нийслэл нь Улаанбаатар хот юм.

Training Details

Parameter	Value
Base Model	`meta-llama/Llama-3.2-3B-Instruct`
Method	LoRA (PEFT)
Dataset	`saillab/alpaca-mongolian-cleaned` (~41,600 examples)
Train/Eval Split	39,520 / 2,081 (95/5, seed=42)
LoRA Rank	32
LoRA Alpha	64
LoRA Dropout	0.05
Target Modules	q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Trainable Parameters	48.6M (1.49% of 3.26B)
Epochs	3
Batch Size	4 (x4 gradient accumulation = effective 16)
Learning Rate	2e-4 (cosine schedule)
Warmup	5% of steps
Precision	float16
Max Sequence Length	512
Training Steps	7,410
Training Time	~130.3 hours
Final Train Loss	0.671
Final Eval Loss	0.628
Token Accuracy	83%

Chat Template

Training used the Llama 3.2 chat template with system prompt: "You are a helpful assistant that responds in Mongolian."

Benchmark Results

MM-Eval (Mongolian Multi-task Evaluation)

MM-Eval (arXiv:2411.09492) is a hierarchical benchmark for evaluating LLMs on Mongolian language tasks across 1,840 items.

Category	Items	Base Model	Fine-tuned	Delta
Syntax	569 (MCQ)	26.89%	35.33%	+8.44%
Semantics	677 (MCQ)	27.47%	37.37%	+9.90%
Knowledge	344 (MCQ)	32.85%	67.44%	+34.59%
Reasoning	250 (numeric)	3.20%	0.80%	-2.40%

Perplexity (on eval split, 2,081 samples)

Model	Perplexity	Avg Loss
Base	18.31	2.9075
Fine-tuned	1.99	0.6881

BLEU / ROUGE-L (200 eval samples)

Metric	Base Model	Fine-tuned	Improvement
BLEU-1	0.1373	0.3245	+136%
BLEU-2	0.0427	0.1281	+200%
ROUGE-L	0.0760	0.1874	+147%

Example Outputs

Prompt: "Монгол улсын нийслэл хаана байдаг вэ?" (Where is the capital of Mongolia?)

Model	Response
Base	Улсын нийслэл хаан байдаг байна. Улсын нийслэл нь Улаанбаatar... (garbled)
Fine-tuned	Монгол улсын нийслэл нь Улаанбаатар хот юм. (The capital of Mongolia is Ulaanbaatar.)

Prompt: "Хүүхдэд зориулж богино үлгэр бичнэ үү." (Write a short story for children.)

Model	Response
Base	Хүүхдэд зориулж богино үлгэр бичихээс бээрхээр эрхэмтгээр өрөмддэг... (incoherent)
Fine-tuned	Эрт урьдын цагт өтгөн ногоон ойд Жак хэмээх бяцхан хөгшин байжээ... (Long ago in a thick green forest lived an old man named Jack...)

Prompt: "Монгол хоолны жор бичнэ үү." (Write a Mongolian food recipe.)

Model	Response
Base	Холны жор хэлээр байна. Хэрээрт хэрээрт бүртгэлтэй... (incoherent)
Fine-tuned	Миний сонгосон жор: Гоймонгийн амтат чинжүү. Орц: 12 ширхэг улаан чинжүү, 1/4 аяга цагаан элсэн чихэр... (Recipe with ingredients list)

Limitations

Domain: Trained on general instruction-following data; may not perform well on specialized domains (medical, legal, technical)
Math/Reasoning: Mathematical reasoning did not improve (slightly declined on MM-Eval reasoning)
Hallucination: Like all LLMs, may generate plausible but factually incorrect information
Sequence Length: Trained with max 512 tokens; may degrade on longer inputs
Model Size: 3B parameters -- larger models would likely achieve better results

Framework Versions

PEFT: 0.18.1
TRL: 0.27.2
Transformers: 5.1.0
PyTorch: 2.10.0+cu128
Datasets: 4.5.0

Citation

If you use this model, please cite:

@misc{llama32-3b-mongolian-2026,
  title={Llama-3.2-3B-Instruct-Mongolian},
  author={Munkhbayar Batkhuu},
  year={2026},
  url={https://huggingface.co/munkhbayar-batkhuu/Llama-3.2-3B-Instruct-Mongolian}
}

Acknowledgments

Meta AI for the Llama 3.2 base model
SAIL Lab for the Mongolian Alpaca dataset
MM-Eval authors for the Mongolian evaluation benchmark

Downloads last month: 15

Model tree for munkhbayar-batkhuu/Llama-3.2-3B-Instruct-Mongolian

Base model

meta-llama/Llama-3.2-3B-Instruct

Adapter

(618)

this model

Dataset used to train munkhbayar-batkhuu/Llama-3.2-3B-Instruct-Mongolian

Paper for munkhbayar-batkhuu/Llama-3.2-3B-Instruct-Mongolian

MM-Eval: A Hierarchical Benchmark for Modern Mongolian Evaluation in LLMs

Paper • 2411.09492 • Published Nov 14, 2024

Evaluation results

Accuracy on MM-Eval Syntax
self-reported

35.330
Accuracy on MM-Eval Semantics
self-reported

37.370
Accuracy on MM-Eval Knowledge
self-reported

67.440
Accuracy on MM-Eval Reasoning
self-reported

0.800
Perplexity on Alpaca Mongolian (eval split)
self-reported

1.990