Instructions to use Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF",
	filename="naija-reviewer-8b-v2-q5_0.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF:Q5_0
# Run inference directly in the terminal:
llama-cli -hf Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF:Q5_0

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF:Q5_0
# Run inference directly in the terminal:
llama-cli -hf Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF:Q5_0

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF:Q5_0
# Run inference directly in the terminal:
./llama-cli -hf Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF:Q5_0

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF:Q5_0
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF:Q5_0

Use Docker

docker model run hf.co/Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF:Q5_0

LM Studio
Jan

vLLM

How to use Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF:Q5_0

Ollama
How to use Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF with Ollama:
```
ollama run hf.co/Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF:Q5_0
```

Unsloth Studio

How to use Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF to start chatting

How to use Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF:Q5_0

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF:Q5_0"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF:Q5_0

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF:Q5_0

Run Hermes

hermes

Docker Model Runner
How to use Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF with Docker Model Runner:
```
docker model run hf.co/Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF:Q5_0
```

Lemonade

How to use Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF:Q5_0

Run and chat with the model

lemonade run user.naija-reviewer-8b-v2-Q5_0-GGUF-Q5_0

List all available models

lemonade list

NaijaReviewer-8B — Q5_0 GGUF

Llama 3.1 8B Instruct QLoRA fine-tuned on Nigerian product reviews. This repository is the Q5_0 GGUF build, the higher-quality quantisation in the NaijaReviewer-8B family: roughly 5 bits per weight, ~6 GB on disk, near-imperceptible quality drop versus FP16 while still running comfortably on a single consumer or commodity-cloud GPU.

NaijaReviewer-8B is the model behind the Naija Persona Agent, a Nigerian-context AI system for review simulation (Task A) and persona-aware recommendation (Task B), submitted to the DSN X BCT LLM Agent Challenge.

Live application: https://switteefranca2-0--naijapersona-web.modal.run/
Source code: https://github.com/Mystique1337/telcoproject
Companion artifacts: Q4_K_M GGUF · Merged HF + all GGUFs · LoRA adapter · Training corpus

Which quantisation should I use?

Build	Bits	Size	Quality	Use case
Q4_K_M	~4	~5 GB	Balanced (Ollama default)	Production serverless, smaller VMs, mobile-class inference.
Q5_0 (this repo)	~5	~6 GB	Recommended for quality	Local inference where quality matters more than disk footprint.
Q8_0	8	~8.5 GB	Near-lossless	Reference / sanity-check runs.

Choose Q5_0 if you have ~6 GB of free VRAM (or system RAM) and want a quality-leaning quant.

Headline numbers

Metric	NaijaReviewer-8B	Frontier baseline (Claude Sonnet 4)
Task A rating RMSE (lower is better)	1.114	1.319 (15.5% higher)
Task A Nigerian-rater win-rate, 5 raters / 50 pairs	48.5% (CI [40.2, 56.9])	51.5% (statistical parity)
Task B NDCG@10 vs four heavyweight baselines	0.588 (best in field)	0.430-0.441
Parameters	8B	70B-120B+
Per-call API cost	Zero (open weights)	$/1k tokens

What's in this repo

naija-reviewer-8b-v2-Q5_0.gguf — the quantised model (~6 GB).
Modelfile — Ollama configuration encoding the exact Alpaca prompt template the model was trained on, with the correct stop tokens.
This README and the license.

Quick start

Ollama

huggingface-cli download Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF \
  naija-reviewer-8b-v2-Q5_0.gguf Modelfile --local-dir .
ollama create naija-reviewer-8b -f Modelfile
ollama run naija-reviewer-8b "Write a review of a Tecno Spark 10 phone from a Lagos Bolt driver."

llama.cpp

huggingface-cli download Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF \
  naija-reviewer-8b-v2-Q5_0.gguf --local-dir .

./llama-cli -m naija-reviewer-8b-v2-Q5_0.gguf \
  -p "### Instruction\nWrite a product review.\n\n### Input\nA Tecno Spark 10 mobile phone, NGN 145,000.\n\n### Response\n" \
  -n 256 --temp 0.7 --top-p 0.9 -no-cnv

llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
    repo_id="Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF",
    filename="naija-reviewer-8b-v2-Q5_0.gguf",
    n_ctx=4096,
    n_gpu_layers=-1,
    chat_format="llama-3",
)
out = llm.create_chat_completion(
    messages=[{
        "role": "user",
        "content": "Write a short, authentic Pidgin review of an Oraimo wireless earbud."
    }],
    max_tokens=256,
    temperature=0.7,
)
print(out["choices"][0]["message"]["content"])

Serverless GPU deployment (Modal)

The reference production deployment uses Q4_K_M for the lowest cold-start cost on a serverless L4 GPU. To swap to Q5_0, change HF_REPO in deploy/modal_naija.py to Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF and redeploy. Expect ~20 % more memory and slightly slower warm latency in exchange for higher fidelity.

Prompt format

Fine-tuned on the Alpaca template:

### Instruction
{instruction}

### Input
{input}

### Response
{response}

Stop sequences: ### Instruction, ### Input, ### Response. The included Ollama Modelfile encodes these along with a Nigerian-context system prompt.

For Nigerian-context use, pass the structured persona JSON (cognitive dimensions + register tier + aspect priorities) as ### Input. The production prompt templates are in the project repo under app/prompts/.

Training


Base model	`meta-llama/Meta-Llama-3.1-8B-Instruct`
Method	QLoRA via Unsloth
Adapter	LoRA r=16, alpha=32, dropout=0.1, targets q/k/v/o/up/gate/down (0.52% trainable params)
Loss	Response-only loss via `train_on_responses_only`
Tokenisation	EOS-terminated training examples
Schedule	2 epochs, effective batch size 16, learning rate 2e-4 with cosine decay, sequence length 4096
Quantisation (this file)	`Q5_0`, ~6 GB

Training data

Trained on Shinzmann/npa-corpus-v1 (~20,000 Alpaca-style instruction/response pairs), built from two real public Jumia sources plus synthetic expansion. Full provenance and the build pipeline are in the dataset card.

Intended use and limitations

Intended use. Generation of Nigerian-context product reviews and ratings (Task A); persona-aware re-ranking of product recommendations (Task B); research on register-aware text generation in low-resource African contexts.

Limitations. Trained primarily on Nigerian English and Nigerian Pidgin product reviews. The training corpus is partly synthetic; two independent generator pipelines and disjoint train/eval generator families were used to mitigate confounds. On a 3-arbiter LLM-as-Judge evaluation, frontier LLM judges showed a systematic preference for Claude Sonnet 4's prose register; Nigerian human raters scored the two systems at parity on the same pairs, which we read as evidence that single-judge LLM evaluation is insufficient on culturally-localised content.

Citation

@misc{naijareviewer8b2026,
  title  = {NaijaReviewer-8B: A Nigerian-Context Open-Weight Fine-Tune for Persona-Aware Review Generation},
  author = {Ashinze, Emmanuel and Uvere, Franca and Oyenekan, Esther},
  year   = {2026},
  url    = {https://huggingface.co/Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF}
}

License

Subject to the Llama 3.1 Community License. Released for research and non-commercial use; commercial use must comply with the upstream Llama 3.1 terms.

Downloads last month: 185

GGUF

Model size

8B params

Architecture

llama

Hardware compatibility

5-bit

Model tree for Shinzmann/naija-reviewer-8b-v2-Q5_0-GGUF

Base model

meta-llama/Llama-3.1-8B

Finetuned

meta-llama/Llama-3.1-8B-Instruct

Adapter

(2407)

this model