NaijaReviewer-8B โ€” Q5_0 GGUF

Llama 3.1 8B Instruct QLoRA fine-tuned on Nigerian product reviews. This repository is the Q5_0 GGUF build, the higher-quality quantisation in the NaijaReviewer-8B family: roughly 5 bits per weight, ~6 GB on disk, near-imperceptible quality drop versus FP16 while still running comfortably on a single consumer or commodity-cloud GPU.

NaijaReviewer-8B is the model behind the Naija Persona Agent, a Nigerian-context AI system for review simulation (Task A) and persona-aware recommendation (Task B), submitted to the DSN X BCT LLM Agent Challenge.

Which quantisation should I use?

Build Bits Size Quality Use case
Q4_K_M ~4 ~5 GB Balanced (Ollama default) Production serverless, smaller VMs, mobile-class inference.
Q5_0 (this repo) ~5 ~6 GB Recommended for quality Local inference where quality matters more than disk footprint.
Q8_0 8 ~8.5 GB Near-lossless Reference / sanity-check runs.

Choose Q5_0 if you have ~6 GB of free VRAM (or system RAM) and want a quality-leaning quant.

Headline numbers

Metric NaijaReviewer-8B Frontier baseline (Claude Sonnet 4)
Task A rating RMSE (lower is better) 1.114 1.319 (15.5% higher)
Task A Nigerian-rater win-rate, 5 raters / 50 pairs 48.5% (CI [40.2, 56.9]) 51.5% (statistical parity)
Task B NDCG@10 vs four heavyweight baselines 0.588 (best in field) 0.430-0.441
Parameters 8B 70B-120B+
Per-call API cost Zero (open weights) $/1k tokens

What's in this repo

  • naija-reviewer-8b-v2-Q5_0.gguf โ€” the quantised model (~6 GB).
  • Modelfile โ€” Ollama configuration encoding the exact Alpaca prompt template the model was trained on, with the correct stop tokens.
  • This README and the license.

Quick start

Ollama

huggingface-cli download Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF \
  naija-reviewer-8b-v2-Q5_0.gguf Modelfile --local-dir .
ollama create naija-reviewer-8b -f Modelfile
ollama run naija-reviewer-8b "Write a review of a Tecno Spark 10 phone from a Lagos Bolt driver."

llama.cpp

huggingface-cli download Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF \
  naija-reviewer-8b-v2-Q5_0.gguf --local-dir .

./llama-cli -m naija-reviewer-8b-v2-Q5_0.gguf \
  -p "### Instruction\nWrite a product review.\n\n### Input\nA Tecno Spark 10 mobile phone, NGN 145,000.\n\n### Response\n" \
  -n 256 --temp 0.7 --top-p 0.9 -no-cnv

llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
    repo_id="Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF",
    filename="naija-reviewer-8b-v2-Q5_0.gguf",
    n_ctx=4096,
    n_gpu_layers=-1,
    chat_format="llama-3",
)
out = llm.create_chat_completion(
    messages=[{
        "role": "user",
        "content": "Write a short, authentic Pidgin review of an Oraimo wireless earbud."
    }],
    max_tokens=256,
    temperature=0.7,
)
print(out["choices"][0]["message"]["content"])

Serverless GPU deployment (Modal)

The reference production deployment uses Q4_K_M for the lowest cold-start cost on a serverless L4 GPU. To swap to Q5_0, change HF_REPO in deploy/modal_naija.py to Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF and redeploy. Expect ~20 % more memory and slightly slower warm latency in exchange for higher fidelity.

Prompt format

Fine-tuned on the Alpaca template:

### Instruction
{instruction}

### Input
{input}

### Response
{response}

Stop sequences: ### Instruction, ### Input, ### Response. The included Ollama Modelfile encodes these along with a Nigerian-context system prompt.

For Nigerian-context use, pass the structured persona JSON (cognitive dimensions + register tier + aspect priorities) as ### Input. The production prompt templates are in the project repo under app/prompts/.

Training

Base model meta-llama/Meta-Llama-3.1-8B-Instruct
Method QLoRA via Unsloth
Adapter LoRA r=16, alpha=32, dropout=0.1, targets q/k/v/o/up/gate/down (0.52% trainable params)
Loss Response-only loss via train_on_responses_only
Tokenisation EOS-terminated training examples
Schedule 2 epochs, effective batch size 16, learning rate 2e-4 with cosine decay, sequence length 4096
Quantisation (this file) Q5_0, ~6 GB

Training data

Trained on Shinzmann/npa-corpus-v1 (~20,000 Alpaca-style instruction/response pairs), built from two real public Jumia sources plus synthetic expansion. Full provenance and the build pipeline are in the dataset card.

Intended use and limitations

Intended use. Generation of Nigerian-context product reviews and ratings (Task A); persona-aware re-ranking of product recommendations (Task B); research on register-aware text generation in low-resource African contexts.

Limitations. Trained primarily on Nigerian English and Nigerian Pidgin product reviews. The training corpus is partly synthetic; two independent generator pipelines and disjoint train/eval generator families were used to mitigate confounds. On a 3-arbiter LLM-as-Judge evaluation, frontier LLM judges showed a systematic preference for Claude Sonnet 4's prose register; Nigerian human raters scored the two systems at parity on the same pairs, which we read as evidence that single-judge LLM evaluation is insufficient on culturally-localised content.

Citation

@misc{naijareviewer8b2026,
  title  = {NaijaReviewer-8B: A Nigerian-Context Open-Weight Fine-Tune for Persona-Aware Review Generation},
  author = {Ashinze, Emmanuel and Uvere, Franca and Oyenekan, Esther},
  year   = {2026},
  url    = {https://huggingface.co/Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF}
}

License

Subject to the Llama 3.1 Community License. Released for research and non-commercial use; commercial use must comply with the upstream Llama 3.1 terms.

Downloads last month
185
GGUF
Model size
8B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

5-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF

Adapter
(2407)
this model