Instructions to use nshportun/usa-immigration-llama-3.2-3b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use nshportun/usa-immigration-llama-3.2-3b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="nshportun/usa-immigration-llama-3.2-3b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("nshportun/usa-immigration-llama-3.2-3b")
model = AutoModelForCausalLM.from_pretrained("nshportun/usa-immigration-llama-3.2-3b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use nshportun/usa-immigration-llama-3.2-3b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "nshportun/usa-immigration-llama-3.2-3b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nshportun/usa-immigration-llama-3.2-3b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/nshportun/usa-immigration-llama-3.2-3b

SGLang

How to use nshportun/usa-immigration-llama-3.2-3b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "nshportun/usa-immigration-llama-3.2-3b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nshportun/usa-immigration-llama-3.2-3b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "nshportun/usa-immigration-llama-3.2-3b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nshportun/usa-immigration-llama-3.2-3b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use nshportun/usa-immigration-llama-3.2-3b with Docker Model Runner:
```
docker model run hf.co/nshportun/usa-immigration-llama-3.2-3b
```

USA Immigration Law — Llama 3.2 3B

Fine-tuned from meta-llama/Llama-3.2-3B-Instruct on the nshportun/usa-immigration-law-qa dataset — 17,058 source-grounded Q&A pairs covering all major U.S. immigration subdomains.

Training Details

Setting	Value
Base model	Llama 3.2 3B Instruct
Method	LoRA (r=8, alpha=32, merged into base weights)
Training pairs	16,065
Eval pairs	993 (stratified across 13 subdomains)
Epochs	1
Batch size	1 per device (int8 quantization)
Learning rate	1e-4
Max input length	512 tokens
Infrastructure	AWS SageMaker ml.g5.2xlarge (24GB VRAM)
Train loss	0.894
Eval loss	0.903
Eval perplexity	2.47

Benchmark Results

Evaluated on a stratified random sample of 101 questions across all 13 immigration subdomains from the held-out eval set. Answers scored 0–3 by an LLM judge (Claude Sonnet 4.6) against reference answers from official sources.

Scoring scale: 0 = wrong/hallucinated · 1 = partially correct · 2 = mostly correct · 3 = fully correct

Evaluation date: 2026-05-17
Judge model: us.anthropic.claude-sonnet-4-6 (Amazon Bedrock)
Eval set source: nshportun/usa-immigration-law-qa, split=eval, seed=42
Fine-tuned model inference: local CPU (transformers 5.8.1, bfloat16, device_map=cpu)

Overall Scores

Model	Mean Score (0–3)	% Fully Correct (score=3)	N
Llama 3.2 3B fine-tuned (this model)	0.68	7.9%	101
Claude Sonnet 4.6 zero-shot	1.47	25.7%	101
Llama 3 8B zero-shot (base family)	0.80	2.0%	101

Why baselines matter: Claude Sonnet 4.6 is a frontier model 100x larger than this 3B model. Llama 3 8B zero-shot achieves only 2.0% fully-correct on these domain-specific questions, establishing the difficulty of the task. The fine-tuned 3B model achieves 7.9% fully-correct — outperforming the zero-shot 8B baseline on that metric despite being 2.7x smaller.

By Subdomain — Llama 3.2 3B Fine-tuned (this model)

Subdomain	Mean Score	% Fully Correct	N
Travel documents	1.83	33.3%	6
Naturalization	1.13	25.0%	8
Statistics	1.13	12.5%	8
Appeals	1.00	0.0%	3
Nonimmigrant visas	0.88	12.5%	8
Adjustment of status	0.75	0.0%	8
Employment authorization	0.75	12.5%	8
Asylum	0.50	12.5%	8
Admissibility	0.38	0.0%	8
Family-based immigration	0.38	0.0%	8
Humanitarian	0.38	0.0%	8
Removal	0.38	0.0%	8
General	0.25	0.0%	8
Employment-based (EB)	0.00	0.0%	4

By Subdomain — Claude Sonnet 4.6 Zero-Shot Baseline

Subdomain	Mean Score	% Fully Correct	N
Travel documents	2.33	33.3%	6
Adjustment of status	2.25	62.5%	8
Humanitarian	2.13	50.0%	8
Asylum	2.00	50.0%	8
Admissibility	1.50	25.0%	8
Naturalization	1.50	25.0%	8
Nonimmigrant visas	1.50	25.0%	8
Family-based immigration	1.13	12.5%	8
Removal	1.25	12.5%	8
Statistics	1.25	12.5%	8
Appeals	1.00	0.0%	3
Employment authorization	0.75	12.5%	8
Employment-based (EB)	0.75	25.0%	4
General	0.75	0.0%	8

By Subdomain — Llama 3 8B Zero-Shot Baseline

Subdomain	Mean Score	% Fully Correct	N
Adjustment of status	1.25	0.0%	8
Travel documents	1.17	0.0%	6
Asylum	1.13	12.5%	8
Removal	0.88	0.0%	8
Statistics	0.88	0.0%	8
Humanitarian	0.75	12.5%	8
Naturalization	0.75	0.0%	8
Admissibility	0.75	0.0%	8
Nonimmigrant visas	0.75	0.0%	8
Employment authorization	0.63	0.0%	8
General	0.63	0.0%	8
Employment-based (EB)	0.50	0.0%	4
Family-based immigration	0.50	0.0%	8
Appeals	0.33	0.0%	3

Key Observations

The task is genuinely hard: Even Claude Sonnet 4.6 (a frontier model) scores only 1.47/3.0 mean and 25.7% fully-correct. This reflects the highly specific, citation-level precision required by immigration procedural questions.
Fine-tuning boosts fully-correct rate: The 3B fine-tuned model achieves 7.9% fully-correct vs. 2.0% for the zero-shot 8B base — a 4x improvement on exact correctness despite being 2.7x smaller, with 1 epoch of domain training.
Strongest subdomains for fine-tuned model: travel documents (1.83), naturalization (1.13), statistics (1.13) — procedural topics well-represented in training data.
Weakest subdomains: employment-based (0.00), general (0.25), removal (0.38) — topics requiring cross-referencing multiple USCIS form instructions or policy details.
Room for improvement: The fine-tuned model's mean (0.68) is below the zero-shot 8B base (0.80), suggesting either 1-epoch training is insufficient or the model needs more specific instruction tuning rather than completion-style fine-tuning.

Reproducing the Benchmark

# Clone repo and install deps
git clone https://github.com/nshportun/usa-immigration
pip install -r requirements.txt

# Set environment variables (AWS Bedrock for baseline models + judge)
export ACCOUNT2_AWS_ACCESS_KEY_ID=...
export ACCOUNT2_AWS_SECRET_ACCESS_KEY=...

# Run baseline benchmark (Claude Sonnet + Llama 3 8B via Bedrock)
python scripts/benchmark/run_benchmark.py

# Run fine-tuned model inference on CPU (requires model artifacts locally)
# Download from: https://huggingface.co/nshportun/usa-immigration-llama-3.2-3b
python scripts/benchmark/run_local_finetuned.py

# Results written to:
#   data_local/benchmark/results.jsonl  (per-question scores)
#   data_local/benchmark/summary.json   (aggregate table)

The benchmark script supports resume — it skips already-scored questions. random.seed(42) ensures the same 101-question sample is selected each run.

Immigration Subdomains Covered

Subdomain	QA Pairs
Family-based immigration	~3,987
Naturalization	~2,670
Asylum	~2,094
Adjustment of status	~1,727
Removal	~1,277
Humanitarian	~894
Employment authorization	~832
Admissibility	~553
Nonimmigrant visas	~548
Travel documents	~109
Employment-based (EB)	~74
Appeals	~66
Statistics	~141

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "nshportun/usa-immigration-llama-3.2-3b"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, dtype=torch.bfloat16, device_map="auto")

messages = [
    {"role": "system", "content": "You are an expert on U.S. immigration law. Answer accurately based on USCIS, 8 CFR, and BIA sources."},
    {"role": "user", "content": "What is the filing fee for Form I-485?"},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=300, do_sample=False)
print(tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Data Sources

USCIS Policy Manual — primary_official
USCIS Forms & Instructions (I-130, I-485, I-765, N-400, I-589...) — primary_official
8 CFR / INA statute text — primary_official
BIA Precedent Decisions — primary_official
harshitha008/US-immigration-laws (Apache 2.0) — secondary_reputable
Law StackExchange immigration posts — community

Intended Use

RAG-based immigration legal assistants
Domain-specific LLM benchmarking
Immigration law Q&A research

Disclaimer

This model is for research and educational purposes only. It does not constitute legal advice. Immigration law is complex and changes frequently — always consult a licensed immigration attorney.

Downloads last month: 150

Safetensors

Model size

3B params

Tensor type

BF16

Model tree for nshportun/usa-immigration-llama-3.2-3b

Base model

meta-llama/Llama-3.2-3B-Instruct

Adapter

(761)

this model

nshportun
/

usa-immigration-llama-3.2-3b