Instructions to use Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF", filename="naija-reviewer-8b-v2-q5_0.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF:Q5_0 # Run inference directly in the terminal: llama-cli -hf Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF:Q5_0
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF:Q5_0 # Run inference directly in the terminal: llama-cli -hf Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF:Q5_0
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF:Q5_0 # Run inference directly in the terminal: ./llama-cli -hf Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF:Q5_0
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF:Q5_0 # Run inference directly in the terminal: ./build/bin/llama-cli -hf Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF:Q5_0
Use Docker
docker model run hf.co/Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF:Q5_0
- LM Studio
- Jan
- vLLM
How to use Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF:Q5_0
- Ollama
How to use Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF with Ollama:
ollama run hf.co/Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF:Q5_0
- Unsloth Studio
How to use Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF to start chatting
- Pi
How to use Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF:Q5_0
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF:Q5_0" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF:Q5_0
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF:Q5_0
Run Hermes
hermes
- Docker Model Runner
How to use Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF with Docker Model Runner:
docker model run hf.co/Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF:Q5_0
- Lemonade
How to use Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF:Q5_0
Run and chat with the model
lemonade run user.naija-reviewer-8b-v2-Q5_0-GGUF-Q5_0
List all available models
lemonade list
NaijaReviewer-8B โ Q5_0 GGUF
Llama 3.1 8B Instruct QLoRA fine-tuned on Nigerian product reviews. This repository is the Q5_0 GGUF build, the higher-quality quantisation in the NaijaReviewer-8B family: roughly 5 bits per weight, ~6 GB on disk, near-imperceptible quality drop versus FP16 while still running comfortably on a single consumer or commodity-cloud GPU.
NaijaReviewer-8B is the model behind the Naija Persona Agent, a Nigerian-context AI system for review simulation (Task A) and persona-aware recommendation (Task B), submitted to the DSN X BCT LLM Agent Challenge.
- Live application: https://switteefranca2-0--naijapersona-web.modal.run/
- Source code: https://github.com/Mystique1337/telcoproject
- Companion artifacts: Q4_K_M GGUF ยท Merged HF + all GGUFs ยท LoRA adapter ยท Training corpus
Which quantisation should I use?
| Build | Bits | Size | Quality | Use case |
|---|---|---|---|---|
| Q4_K_M | ~4 | ~5 GB | Balanced (Ollama default) | Production serverless, smaller VMs, mobile-class inference. |
| Q5_0 (this repo) | ~5 | ~6 GB | Recommended for quality | Local inference where quality matters more than disk footprint. |
| Q8_0 | 8 | ~8.5 GB | Near-lossless | Reference / sanity-check runs. |
Choose Q5_0 if you have ~6 GB of free VRAM (or system RAM) and want a quality-leaning quant.
Headline numbers
| Metric | NaijaReviewer-8B | Frontier baseline (Claude Sonnet 4) |
|---|---|---|
| Task A rating RMSE (lower is better) | 1.114 | 1.319 (15.5% higher) |
| Task A Nigerian-rater win-rate, 5 raters / 50 pairs | 48.5% (CI [40.2, 56.9]) | 51.5% (statistical parity) |
| Task B NDCG@10 vs four heavyweight baselines | 0.588 (best in field) | 0.430-0.441 |
| Parameters | 8B | 70B-120B+ |
| Per-call API cost | Zero (open weights) | $/1k tokens |
What's in this repo
naija-reviewer-8b-v2-Q5_0.ggufโ the quantised model (~6 GB).Modelfileโ Ollama configuration encoding the exact Alpaca prompt template the model was trained on, with the correct stop tokens.- This README and the license.
Quick start
Ollama
huggingface-cli download Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF \
naija-reviewer-8b-v2-Q5_0.gguf Modelfile --local-dir .
ollama create naija-reviewer-8b -f Modelfile
ollama run naija-reviewer-8b "Write a review of a Tecno Spark 10 phone from a Lagos Bolt driver."
llama.cpp
huggingface-cli download Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF \
naija-reviewer-8b-v2-Q5_0.gguf --local-dir .
./llama-cli -m naija-reviewer-8b-v2-Q5_0.gguf \
-p "### Instruction\nWrite a product review.\n\n### Input\nA Tecno Spark 10 mobile phone, NGN 145,000.\n\n### Response\n" \
-n 256 --temp 0.7 --top-p 0.9 -no-cnv
llama-cpp-python
from llama_cpp import Llama
llm = Llama.from_pretrained(
repo_id="Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF",
filename="naija-reviewer-8b-v2-Q5_0.gguf",
n_ctx=4096,
n_gpu_layers=-1,
chat_format="llama-3",
)
out = llm.create_chat_completion(
messages=[{
"role": "user",
"content": "Write a short, authentic Pidgin review of an Oraimo wireless earbud."
}],
max_tokens=256,
temperature=0.7,
)
print(out["choices"][0]["message"]["content"])
Serverless GPU deployment (Modal)
The reference production deployment uses Q4_K_M for the lowest cold-start cost on a serverless L4 GPU. To swap to Q5_0, change HF_REPO in deploy/modal_naija.py to Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF and redeploy. Expect ~20 % more memory and slightly slower warm latency in exchange for higher fidelity.
Prompt format
Fine-tuned on the Alpaca template:
### Instruction
{instruction}
### Input
{input}
### Response
{response}
Stop sequences: ### Instruction, ### Input, ### Response. The included Ollama Modelfile encodes these along with a Nigerian-context system prompt.
For Nigerian-context use, pass the structured persona JSON (cognitive dimensions + register tier + aspect priorities) as ### Input. The production prompt templates are in the project repo under app/prompts/.
Training
| Base model | meta-llama/Meta-Llama-3.1-8B-Instruct |
| Method | QLoRA via Unsloth |
| Adapter | LoRA r=16, alpha=32, dropout=0.1, targets q/k/v/o/up/gate/down (0.52% trainable params) |
| Loss | Response-only loss via train_on_responses_only |
| Tokenisation | EOS-terminated training examples |
| Schedule | 2 epochs, effective batch size 16, learning rate 2e-4 with cosine decay, sequence length 4096 |
| Quantisation (this file) | Q5_0, ~6 GB |
Training data
Trained on Shinzmann/npa-corpus-v1 (~20,000 Alpaca-style instruction/response pairs), built from two real public Jumia sources plus synthetic expansion. Full provenance and the build pipeline are in the dataset card.
Intended use and limitations
Intended use. Generation of Nigerian-context product reviews and ratings (Task A); persona-aware re-ranking of product recommendations (Task B); research on register-aware text generation in low-resource African contexts.
Limitations. Trained primarily on Nigerian English and Nigerian Pidgin product reviews. The training corpus is partly synthetic; two independent generator pipelines and disjoint train/eval generator families were used to mitigate confounds. On a 3-arbiter LLM-as-Judge evaluation, frontier LLM judges showed a systematic preference for Claude Sonnet 4's prose register; Nigerian human raters scored the two systems at parity on the same pairs, which we read as evidence that single-judge LLM evaluation is insufficient on culturally-localised content.
Citation
@misc{naijareviewer8b2026,
title = {NaijaReviewer-8B: A Nigerian-Context Open-Weight Fine-Tune for Persona-Aware Review Generation},
author = {Ashinze, Emmanuel and Uvere, Franca and Oyenekan, Esther},
year = {2026},
url = {https://huggingface.co/Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF}
}
License
Subject to the Llama 3.1 Community License. Released for research and non-commercial use; commercial use must comply with the upstream Llama 3.1 terms.
- Downloads last month
- 185
5-bit
Model tree for Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF
Base model
meta-llama/Llama-3.1-8B