gemma-3-1b-ua-safe-summarization

Full fine-tune of unsloth/gemma-3-1b-it on Ukrainian news summarization.
Trained with Unsloth + TRL SFT on the csebuetnlp/xlsum dataset (Ukrainian split of XL-Sum with safety filtering).

Training details


Base model	unsloth/gemma-3-1b-it
Dataset	csebuetnlp/xlsum
Fine-tuning	Full SFT (no LoRA)
Framework	Unsloth + TRL
Epochs	~1.48 (checkpoint-4000, best val ROUGE-L)
Max seq length	3072
Batch size	8 per device
Learning rate	2e-5 (cosine decay, warmup 3 %)
Precision	bfloat16
Optimizer	adamw_8bit
Best eval ROUGE-L	22.23

Response-only masking was applied — loss is computed on the model turn only.

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "nuinashco/gemma-3-1b-it-xlsum-ua-sft"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id, torch_dtype=torch.bfloat16, device_map="auto"
)

article = "Ваш текст новини тут..."
prompt = [
    {"role": "user", "content": f"Зроби короткий переказ наступного тексту:\n{article}"}
]
inputs = tokenizer.apply_chat_template(
    prompt, add_generation_prompt=True, return_tensors="pt"
).to(model.device)

out = model.generate(inputs, max_new_tokens=128, do_sample=False)
print(tokenizer.decode(out[0][inputs.shape[-1]:], skip_special_tokens=True))

Limitations

Trained for Ukrainian only; performance on other languages is undefined.
Inherits any biases present in the base model and training corpus.
Summaries may occasionally be factually inaccurate; always verify against the source.