Instructions to use shehryars715/finetuned-Llama-3.1-8B-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use shehryars715/finetuned-Llama-3.1-8B-Instruct with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit")
model = PeftModel.from_pretrained(base_model, "shehryars715/finetuned-Llama-3.1-8B-Instruct")

Transformers

How to use shehryars715/finetuned-Llama-3.1-8B-Instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="shehryars715/finetuned-Llama-3.1-8B-Instruct")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("shehryars715/finetuned-Llama-3.1-8B-Instruct", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use shehryars715/finetuned-Llama-3.1-8B-Instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "shehryars715/finetuned-Llama-3.1-8B-Instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "shehryars715/finetuned-Llama-3.1-8B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/shehryars715/finetuned-Llama-3.1-8B-Instruct

SGLang

How to use shehryars715/finetuned-Llama-3.1-8B-Instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "shehryars715/finetuned-Llama-3.1-8B-Instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "shehryars715/finetuned-Llama-3.1-8B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "shehryars715/finetuned-Llama-3.1-8B-Instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "shehryars715/finetuned-Llama-3.1-8B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio new

How to use shehryars715/finetuned-Llama-3.1-8B-Instruct with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for shehryars715/finetuned-Llama-3.1-8B-Instruct to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for shehryars715/finetuned-Llama-3.1-8B-Instruct to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for shehryars715/finetuned-Llama-3.1-8B-Instruct to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="shehryars715/finetuned-Llama-3.1-8B-Instruct",
    max_seq_length=2048,
)

Docker Model Runner
How to use shehryars715/finetuned-Llama-3.1-8B-Instruct with Docker Model Runner:
```
docker model run hf.co/shehryars715/finetuned-Llama-3.1-8B-Instruct
```

🌾 Agricultural Advisory LLM — Llama 3.1 8B (Pakistan)

A LoRA fine-tuned version of Meta-Llama-3.1-8B-Instruct specialized for Pakistani crop farming advisory. The model answers general crop questions and interprets field sensor data (NDVI, EVI, NDWI, temperature, humidity) to provide concise, actionable farm advisories.

Model Details

Base model: meta-llama/Meta-Llama-3.1-8B-Instruct (4-bit quantized via Unsloth)
Fine-tuning method: LoRA (rank 16, alpha 16, RSLoRA enabled)
Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Trainable parameters: ~0.53% of total
Hardware: NVIDIA Tesla T4 (14.56 GB VRAM)
Training time: ~36 minutes (2203s)
Peak VRAM: 14.28 GB

Training Details

Dataset

General Q&A: Synthetic agricultural advisories covering crops, topics, and questions relevant to Pakistani farming conditions
Farm-specific: Sensor-based advisories using field readings (NDVI, EVI, SAVI, MSAVI, NDWI, GNDVI, temperature, humidity, etc.)
Total examples: 4,730 mixed and shuffled records
Packed examples: ~335 per epoch (via Unsloth sequence packing)

Hyperparameters

Parameter	Value
Epochs	2
Learning rate	1e-4
LR scheduler	Cosine
Warmup ratio	0.1
Batch size (per device)	2
Gradient accumulation steps	4
Effective batch size	8
Weight decay	0.05
Max grad norm	0.3
Optimizer	AdamW 8-bit
Precision	bf16
Max sequence length	2048
Packing	Enabled

Training Loss

Step	Loss
5	3.4473
10	1.6792
15	0.8774
20	0.7404
25	0.6289
30	0.6021
35	0.6064
40	0.5738
45	0.5489

Final step loss settled at 0.55, indicating solid generalization without overfitting.

Usage

from unsloth import FastLanguageModel
import torch

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name    = "your-hf-username/finetuned-Llama-3.1-8B-Instruct",
    max_seq_length= 2048,
    dtype         = None,
    load_in_4bit  = True,
)
FastLanguageModel.for_inference(model)

SYSTEM_PROMPT = (
    "You are an expert agricultural advisor specializing in Pakistani crop farming. "
    "You can answer general crop questions and also interpret field sensor data "
    "(NDVI, EVI, NDWI, temperature, humidity, etc.) to provide precise farm advisories. "
    "Answer accurately and concisely based on official recommendations and best practices. "
    "Keep answers under 3 sentences. Do not include citations, URLs, or markdown headers. "
    "Answer directly and stop."
)

def ask(crop, question, topic="General"):
    messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user",   "content": f"[Crop: {crop} | Topic: {topic}]\n{question}"},
    ]
    inputs = tokenizer.apply_chat_template(
        messages, tokenize=True, add_generation_prompt=True, return_tensors="pt"
    ).to("cuda")

    with torch.no_grad():
        out = model.generate(
            input_ids=inputs, max_new_tokens=150,
            use_cache=True, temperature=0.7, top_p=0.9,
            repetition_penalty=1.1,
            pad_token_id=tokenizer.eos_token_id,
        )
    return tokenizer.decode(out[0][inputs.shape[1]:], skip_special_tokens=True).strip()

print(ask("Maize", "How much seed is required per acre?"))
# → 50-60 kg per acre for good stand at 40-45 thousand plants per acre.

Farm Sensor Advisory

def ask_farm(crop, stage, sensors: dict):
    sensor_str = "\n".join(f"{k}: {v}" for k, v in sensors.items())
    messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user",   "content": (
            f"[Crop: {crop} | Stage: {stage}]\n"
            f"Field sensor readings:\n{sensor_str}\n\n"
            f"Provide a detailed farm advisory based on these readings."
        )},
    ]
    inputs = tokenizer.apply_chat_template(
        messages, tokenize=True, add_generation_prompt=True, return_tensors="pt"
    ).to("cuda")

    with torch.no_grad():
        out = model.generate(
            input_ids=inputs, max_new_tokens=200,
            use_cache=True, temperature=0.7, top_p=0.9,
            repetition_penalty=1.1,
            pad_token_id=tokenizer.eos_token_id,
        )
    return tokenizer.decode(out[0][inputs.shape[1]:], skip_special_tokens=True).strip()

print(ask_farm("Cotton", "Boll Formation", {"NDVI": 0.38, "temperature_c": 34, "relative_humidity": 55}))

Sample Outputs

Crop	Question	Answer
Maize	Which varieties are high-yielding in Pakistan?	SH-32, PKV-1, Khyber-2002, and Pehlu-6 are high-yielding, disease-resistant hybrids widely grown across Pakistan.
Maize	How much seed per acre?	50-60 kg per acre for good stand at 40-45 thousand plants per acre.
Maize	NDVI 0.42 — is that healthy?	Moderate stress. Look for uniformity — a hotspot indicates disease or pest issue.
Cotton	Pesticide for whitefly?	Use neonicotinoid seed treatments or foliar applications of imidacloprid or acetamiprid. Practice good sanitation and remove weeds that harbor nymphs.

Comparison with Qwen2.5-7B Fine-tune

Metric	Llama 3.1 8B	Qwen 2.5 7B
Final step loss	0.55	0.63
Training time	36 min	31 min
Peak VRAM	14.28 GB	12.02 GB
Epochs	2	1
Answer style	Concise + actionable	Concise + technical

Limitations

Trained on synthetic data — real-world agronomic validation recommended before deployment
Pakistan-specific; recommendations may not transfer to other regions
Sensor advisory accuracy depends on data quality and crop stage alignment
VRAM usage is near T4 ceiling — do not increase batch size without gradient checkpointing
Not a substitute for consultation with local agricultural extension services

Authors

Developed for the AgroBot-Research project.

Downloads last month: 33

Model tree for shehryars715/finetuned-Llama-3.1-8B-Instruct

Base model

meta-llama/Llama-3.1-8B

Finetuned

meta-llama/Llama-3.1-8B-Instruct

Quantized

unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit

Adapter

(78)

this model