Instructions to use nshportun/usa-immigration-llama-3.2-3b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use nshportun/usa-immigration-llama-3.2-3b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="nshportun/usa-immigration-llama-3.2-3b") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("nshportun/usa-immigration-llama-3.2-3b") model = AutoModelForCausalLM.from_pretrained("nshportun/usa-immigration-llama-3.2-3b") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use nshportun/usa-immigration-llama-3.2-3b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "nshportun/usa-immigration-llama-3.2-3b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nshportun/usa-immigration-llama-3.2-3b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/nshportun/usa-immigration-llama-3.2-3b
- SGLang
How to use nshportun/usa-immigration-llama-3.2-3b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "nshportun/usa-immigration-llama-3.2-3b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nshportun/usa-immigration-llama-3.2-3b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "nshportun/usa-immigration-llama-3.2-3b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nshportun/usa-immigration-llama-3.2-3b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use nshportun/usa-immigration-llama-3.2-3b with Docker Model Runner:
docker model run hf.co/nshportun/usa-immigration-llama-3.2-3b
USA Immigration Law β Llama 3.2 3B
Fine-tuned from meta-llama/Llama-3.2-3B-Instruct on the nshportun/usa-immigration-law-qa dataset β 17,058 source-grounded Q&A pairs covering all major U.S. immigration subdomains.
Training Details
| Setting | Value |
|---|---|
| Base model | Llama 3.2 3B Instruct |
| Method | LoRA (r=8, alpha=32, merged into base weights) |
| Training pairs | 16,065 |
| Eval pairs | 993 (stratified across 13 subdomains) |
| Epochs | 1 |
| Batch size | 1 per device (int8 quantization) |
| Learning rate | 1e-4 |
| Max input length | 512 tokens |
| Infrastructure | AWS SageMaker ml.g5.2xlarge (24GB VRAM) |
| Train loss | 0.894 |
| Eval loss | 0.903 |
| Eval perplexity | 2.47 |
Benchmark Results
Evaluated on a stratified random sample of 101 questions across all 13 immigration subdomains from the held-out eval set. Answers scored 0β3 by an LLM judge (Claude Sonnet 4.6) against reference answers from official sources.
Scoring scale: 0 = wrong/hallucinated Β· 1 = partially correct Β· 2 = mostly correct Β· 3 = fully correct
Evaluation date: 2026-05-17
Judge model: us.anthropic.claude-sonnet-4-6 (Amazon Bedrock)
Eval set source: nshportun/usa-immigration-law-qa, split=eval, seed=42
Fine-tuned model inference: local CPU (transformers 5.8.1, bfloat16, device_map=cpu)
Overall Scores
| Model | Mean Score (0β3) | % Fully Correct (score=3) | N |
|---|---|---|---|
| Llama 3.2 3B fine-tuned (this model) | 0.68 | 7.9% | 101 |
| Claude Sonnet 4.6 zero-shot | 1.47 | 25.7% | 101 |
| Llama 3 8B zero-shot (base family) | 0.80 | 2.0% | 101 |
Why baselines matter: Claude Sonnet 4.6 is a frontier model 100x larger than this 3B model. Llama 3 8B zero-shot achieves only 2.0% fully-correct on these domain-specific questions, establishing the difficulty of the task. The fine-tuned 3B model achieves 7.9% fully-correct β outperforming the zero-shot 8B baseline on that metric despite being 2.7x smaller.
By Subdomain β Llama 3.2 3B Fine-tuned (this model)
| Subdomain | Mean Score | % Fully Correct | N |
|---|---|---|---|
| Travel documents | 1.83 | 33.3% | 6 |
| Naturalization | 1.13 | 25.0% | 8 |
| Statistics | 1.13 | 12.5% | 8 |
| Appeals | 1.00 | 0.0% | 3 |
| Nonimmigrant visas | 0.88 | 12.5% | 8 |
| Adjustment of status | 0.75 | 0.0% | 8 |
| Employment authorization | 0.75 | 12.5% | 8 |
| Asylum | 0.50 | 12.5% | 8 |
| Admissibility | 0.38 | 0.0% | 8 |
| Family-based immigration | 0.38 | 0.0% | 8 |
| Humanitarian | 0.38 | 0.0% | 8 |
| Removal | 0.38 | 0.0% | 8 |
| General | 0.25 | 0.0% | 8 |
| Employment-based (EB) | 0.00 | 0.0% | 4 |
By Subdomain β Claude Sonnet 4.6 Zero-Shot Baseline
| Subdomain | Mean Score | % Fully Correct | N |
|---|---|---|---|
| Travel documents | 2.33 | 33.3% | 6 |
| Adjustment of status | 2.25 | 62.5% | 8 |
| Humanitarian | 2.13 | 50.0% | 8 |
| Asylum | 2.00 | 50.0% | 8 |
| Admissibility | 1.50 | 25.0% | 8 |
| Naturalization | 1.50 | 25.0% | 8 |
| Nonimmigrant visas | 1.50 | 25.0% | 8 |
| Family-based immigration | 1.13 | 12.5% | 8 |
| Removal | 1.25 | 12.5% | 8 |
| Statistics | 1.25 | 12.5% | 8 |
| Appeals | 1.00 | 0.0% | 3 |
| Employment authorization | 0.75 | 12.5% | 8 |
| Employment-based (EB) | 0.75 | 25.0% | 4 |
| General | 0.75 | 0.0% | 8 |
By Subdomain β Llama 3 8B Zero-Shot Baseline
| Subdomain | Mean Score | % Fully Correct | N |
|---|---|---|---|
| Adjustment of status | 1.25 | 0.0% | 8 |
| Travel documents | 1.17 | 0.0% | 6 |
| Asylum | 1.13 | 12.5% | 8 |
| Removal | 0.88 | 0.0% | 8 |
| Statistics | 0.88 | 0.0% | 8 |
| Humanitarian | 0.75 | 12.5% | 8 |
| Naturalization | 0.75 | 0.0% | 8 |
| Admissibility | 0.75 | 0.0% | 8 |
| Nonimmigrant visas | 0.75 | 0.0% | 8 |
| Employment authorization | 0.63 | 0.0% | 8 |
| General | 0.63 | 0.0% | 8 |
| Employment-based (EB) | 0.50 | 0.0% | 4 |
| Family-based immigration | 0.50 | 0.0% | 8 |
| Appeals | 0.33 | 0.0% | 3 |
Key Observations
- The task is genuinely hard: Even Claude Sonnet 4.6 (a frontier model) scores only 1.47/3.0 mean and 25.7% fully-correct. This reflects the highly specific, citation-level precision required by immigration procedural questions.
- Fine-tuning boosts fully-correct rate: The 3B fine-tuned model achieves 7.9% fully-correct vs. 2.0% for the zero-shot 8B base β a 4x improvement on exact correctness despite being 2.7x smaller, with 1 epoch of domain training.
- Strongest subdomains for fine-tuned model: travel documents (1.83), naturalization (1.13), statistics (1.13) β procedural topics well-represented in training data.
- Weakest subdomains: employment-based (0.00), general (0.25), removal (0.38) β topics requiring cross-referencing multiple USCIS form instructions or policy details.
- Room for improvement: The fine-tuned model's mean (0.68) is below the zero-shot 8B base (0.80), suggesting either 1-epoch training is insufficient or the model needs more specific instruction tuning rather than completion-style fine-tuning.
Reproducing the Benchmark
# Clone repo and install deps
git clone https://github.com/nshportun/usa-immigration
pip install -r requirements.txt
# Set environment variables (AWS Bedrock for baseline models + judge)
export ACCOUNT2_AWS_ACCESS_KEY_ID=...
export ACCOUNT2_AWS_SECRET_ACCESS_KEY=...
# Run baseline benchmark (Claude Sonnet + Llama 3 8B via Bedrock)
python scripts/benchmark/run_benchmark.py
# Run fine-tuned model inference on CPU (requires model artifacts locally)
# Download from: https://huggingface.co/nshportun/usa-immigration-llama-3.2-3b
python scripts/benchmark/run_local_finetuned.py
# Results written to:
# data_local/benchmark/results.jsonl (per-question scores)
# data_local/benchmark/summary.json (aggregate table)
The benchmark script supports resume β it skips already-scored questions.
random.seed(42) ensures the same 101-question sample is selected each run.
Immigration Subdomains Covered
| Subdomain | QA Pairs |
|---|---|
| Family-based immigration | ~3,987 |
| Naturalization | ~2,670 |
| Asylum | ~2,094 |
| Adjustment of status | ~1,727 |
| Removal | ~1,277 |
| Humanitarian | ~894 |
| Employment authorization | ~832 |
| Admissibility | ~553 |
| Nonimmigrant visas | ~548 |
| Travel documents | ~109 |
| Employment-based (EB) | ~74 |
| Appeals | ~66 |
| Statistics | ~141 |
Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "nshportun/usa-immigration-llama-3.2-3b"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, dtype=torch.bfloat16, device_map="auto")
messages = [
{"role": "system", "content": "You are an expert on U.S. immigration law. Answer accurately based on USCIS, 8 CFR, and BIA sources."},
{"role": "user", "content": "What is the filing fee for Form I-485?"},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=300, do_sample=False)
print(tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
Data Sources
- USCIS Policy Manual β primary_official
- USCIS Forms & Instructions (I-130, I-485, I-765, N-400, I-589...) β primary_official
- 8 CFR / INA statute text β primary_official
- BIA Precedent Decisions β primary_official
- harshitha008/US-immigration-laws (Apache 2.0) β secondary_reputable
- Law StackExchange immigration posts β community
Intended Use
- RAG-based immigration legal assistants
- Domain-specific LLM benchmarking
- Immigration law Q&A research
Disclaimer
This model is for research and educational purposes only. It does not constitute legal advice. Immigration law is complex and changes frequently β always consult a licensed immigration attorney.
- Downloads last month
- 150
Model tree for nshportun/usa-immigration-llama-3.2-3b
Base model
meta-llama/Llama-3.2-3B-Instruct