Llama-3.2-3B-Instruct-Mongolian

A LoRA fine-tuned version of meta-llama/Llama-3.2-3B-Instruct for Mongolian language instruction-following and chat.

Model Description

This model adapts Llama 3.2 3B Instruct to understand and generate fluent Mongolian text. It was fine-tuned using LoRA (Low-Rank Adaptation) on the saillab/alpaca-mongolian-cleaned dataset containing ~41,600 Mongolian instruction-following examples.

The base model struggles with Mongolian, producing garbled or incoherent text. After fine-tuning, the model generates fluent, coherent Mongolian responses across a wide range of topics.

Quick Start

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

BASE_MODEL = "meta-llama/Llama-3.2-3B-Instruct"
ADAPTER = "munkhbayar-batkhuu/Llama-3.2-3B-Instruct-Mongolian"

tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL)
model = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL, torch_dtype=torch.float16, device_map="auto"
)
model = PeftModel.from_pretrained(model, ADAPTER)
model.eval()

messages = [
    {"role": "system", "content": "You are a helpful assistant that responds in Mongolian."},
    {"role": "user", "content": "Монгол улсын нийслэл хаана байдаг вэ?"},
]

inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", return_dict=True).to(model.device)

with torch.no_grad():
    output = model.generate(**inputs, max_new_tokens=256, temperature=0.7, top_p=0.9, do_sample=True)

response = tokenizer.decode(output[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True)
print(response)
# Output: Монгол улсын нийслэл нь Улаанбаатар хот юм.

Training Details

Parameter Value
Base Model meta-llama/Llama-3.2-3B-Instruct
Method LoRA (PEFT)
Dataset saillab/alpaca-mongolian-cleaned (~41,600 examples)
Train/Eval Split 39,520 / 2,081 (95/5, seed=42)
LoRA Rank 32
LoRA Alpha 64
LoRA Dropout 0.05
Target Modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Trainable Parameters 48.6M (1.49% of 3.26B)
Epochs 3
Batch Size 4 (x4 gradient accumulation = effective 16)
Learning Rate 2e-4 (cosine schedule)
Warmup 5% of steps
Precision float16
Max Sequence Length 512
Training Steps 7,410
Training Time ~130.3 hours
Final Train Loss 0.671
Final Eval Loss 0.628
Token Accuracy 83%

Chat Template

Training used the Llama 3.2 chat template with system prompt: "You are a helpful assistant that responds in Mongolian."

Benchmark Results

MM-Eval (Mongolian Multi-task Evaluation)

MM-Eval (arXiv:2411.09492) is a hierarchical benchmark for evaluating LLMs on Mongolian language tasks across 1,840 items.

Category Items Base Model Fine-tuned Delta
Syntax 569 (MCQ) 26.89% 35.33% +8.44%
Semantics 677 (MCQ) 27.47% 37.37% +9.90%
Knowledge 344 (MCQ) 32.85% 67.44% +34.59%
Reasoning 250 (numeric) 3.20% 0.80% -2.40%

Perplexity (on eval split, 2,081 samples)

Model Perplexity Avg Loss
Base 18.31 2.9075
Fine-tuned 1.99 0.6881

BLEU / ROUGE-L (200 eval samples)

Metric Base Model Fine-tuned Improvement
BLEU-1 0.1373 0.3245 +136%
BLEU-2 0.0427 0.1281 +200%
ROUGE-L 0.0760 0.1874 +147%

Example Outputs

Prompt: "Монгол улсын нийслэл хаана байдаг вэ?" (Where is the capital of Mongolia?)

Model Response
Base Улсын нийслэл хаан байдаг байна. Улсын нийслэл нь Улаанбаatar... (garbled)
Fine-tuned Монгол улсын нийслэл нь Улаанбаатар хот юм. (The capital of Mongolia is Ulaanbaatar.)

Prompt: "Хүүхдэд зориулж богино үлгэр бичнэ үү." (Write a short story for children.)

Model Response
Base Хүүхдэд зориулж богино үлгэр бичихээс бээрхээр эрхэмтгээр өрөмддэг... (incoherent)
Fine-tuned Эрт урьдын цагт өтгөн ногоон ойд Жак хэмээх бяцхан хөгшин байжээ... (Long ago in a thick green forest lived an old man named Jack...)

Prompt: "Монгол хоолны жор бичнэ үү." (Write a Mongolian food recipe.)

Model Response
Base Холны жор хэлээр байна. Хэрээрт хэрээрт бүртгэлтэй... (incoherent)
Fine-tuned Миний сонгосон жор: Гоймонгийн амтат чинжүү. Орц: 12 ширхэг улаан чинжүү, 1/4 аяга цагаан элсэн чихэр... (Recipe with ingredients list)

Limitations

  • Domain: Trained on general instruction-following data; may not perform well on specialized domains (medical, legal, technical)
  • Math/Reasoning: Mathematical reasoning did not improve (slightly declined on MM-Eval reasoning)
  • Hallucination: Like all LLMs, may generate plausible but factually incorrect information
  • Sequence Length: Trained with max 512 tokens; may degrade on longer inputs
  • Model Size: 3B parameters -- larger models would likely achieve better results

Framework Versions

  • PEFT: 0.18.1
  • TRL: 0.27.2
  • Transformers: 5.1.0
  • PyTorch: 2.10.0+cu128
  • Datasets: 4.5.0

Citation

If you use this model, please cite:

@misc{llama32-3b-mongolian-2026,
  title={Llama-3.2-3B-Instruct-Mongolian},
  author={Munkhbayar Batkhuu},
  year={2026},
  url={https://huggingface.co/munkhbayar-batkhuu/Llama-3.2-3B-Instruct-Mongolian}
}

Acknowledgments

  • Meta AI for the Llama 3.2 base model
  • SAIL Lab for the Mongolian Alpaca dataset
  • MM-Eval authors for the Mongolian evaluation benchmark
Downloads last month
15
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for munkhbayar-batkhuu/Llama-3.2-3B-Instruct-Mongolian

Adapter
(618)
this model

Dataset used to train munkhbayar-batkhuu/Llama-3.2-3B-Instruct-Mongolian

Paper for munkhbayar-batkhuu/Llama-3.2-3B-Instruct-Mongolian

Evaluation results