---
language:
- eng
- lug
license: apache-2.0
datasets:
- reuben256/tekjuice-eng-lug-target
metrics:
- bleu
base_model:
- facebook/nllb-200-distilled-600M
pipeline_tag: translation
library_name: transformers
tags:
- Luganda
- Low-Resource
- Seq2Seq
- Distilled
- Machine Translation
- NLLB
- AfricaNLP
- tekjuice
---

# 🧠 Model Card: `reuben256/nllb-distilled-600-lug`

## 🌍 Overview

`reuben256/nllb-distilled-600-lug` is a fine-tuned version of Meta AI’s [NLLB-200 distilled 600M](https://huggingface.co/facebook/nllb-200-distilled-600M) model for **English ↔ Luganda** machine translation. It was developed by **tekjuice AI** 🧪 to support translation in **low-resource African languages**, specifically Luganda 🇺🇬 — a widely spoken Bantu language in Uganda.

---

## 🚀 Use Cases

This model is designed for:

- 📚 Translating educational and public health materials
- 📰 Localizing government or NGO communications
- 🔬 Supporting linguistic and NLP research
- 🧩 Enabling cross-lingual tasks via translation (e.g., summarization, QA)

---

## 📦 Training Data

Fine-tuned using the dataset [`reuben256/tekjuice-eng-lug-target`](https://huggingface.co/datasets/reuben256/tekjuice-eng-lug-target), which includes:

- 📖 Public domain and open-source parallel corpora
- 🌍 Crowdsourced and community-translated sentences
- 🗞️ Aligned media and educational content

---

## 📊 Evaluation

The model was evaluated using the **BLEU** metric 📘 to assess n-gram precision. Testing was done on a held-out set with similar domain characteristics as the training data.

> ⚠️ Note: Human evaluation is recommended for assessing fluency, nuance, and cultural accuracy.

---

## 🏗️ Base Model

Built on top of:
- 🧬 `facebook/nllb-200-distilled-600M` — a distilled multilingual model optimized for **speed** and **low-resource language performance**.

---

## ⚠️ Limitations

- ❌ May struggle with slang, idioms, and culturally specific phrases
- 📉 Biases in training data may be reflected in outputs
- 💡 Performance may degrade on out-of-domain or highly technical content

---

## 🔮 Future Plans

Coming improvements:

- 📈 Larger and more diverse datasets
- 🔁 Reverse direction (Luganda → English)
- 🏥 Domain-specific fine-tuning (e.g., health, legal)
- 🧠 Quality estimation and confidence scoring

---

## 🚀 How to Use

```python
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

MODEL = "reuben256/nllb-distilled-600-lug"

tokenizer = AutoTokenizer.from_pretrained(MODEL)
model = AutoModelForSeq2SeqLM.from_pretrained(MODEL)

tokenizer.src_lang = "eng_Latn"
tokenizer.tgt_lang = "lug_Latn"

text = "Farmers should plant more trees?"
inputs = tokenizer(text, return_tensors="pt")
translated_tokens = model.generate(**inputs)
print(tokenizer.batch_decode(translated_tokens, skip_special_tokens=True))