PK-Genesis MedGemma 4B β€” Pharmaceutical AI

PharmKulen Logo

PK-Genesis is a domain-specific pharmaceutical AI built by PharmKulen, fine-tuned from Google's MedGemma 4B IT using QLoRA on 70,000+ pharmacy-specific training records.

This is Run 2 of 10 in the curriculum training pipeline. The model improves with each successive run.

What It Does

PK-Genesis is designed to assist pharmacists and patients with:

  • Drug Information β€” dosage, indications, contraindications, side effects
  • Drug Interactions β€” checking safety of medication combinations
  • Prescription Understanding β€” explaining prescriptions in simple terms
  • Patient Counseling β€” medication guidance in plain language
  • Clinical Reasoning β€” step-by-step analysis of pharmaceutical scenarios
  • Safety Awareness β€” recognizing emergencies and scope limitations
  • Multilingual Support β€” English, Chinese (δΈ­ζ–‡), Khmer (αžαŸ’αž˜αŸ‚αžš), French, Russian, Korean

Training Details

Base Model

Fine-Tuning Method

  • Method: QLoRA (Quantized Low-Rank Adaptation)
  • Framework: Unsloth FastVisionModel + HuggingFace PEFT + TRL SFTTrainer
  • LoRA Rank: 32
  • LoRA Alpha: 64
  • Target Modules: All linear layers (attention + MLP)
  • Trainable Parameters: ~65.5M (out of 4B total)
  • Sequence Length: 1024 tokens
  • Optimizer: AdamW 8-bit
  • Gradient Checkpointing: Enabled

Curriculum Training (10 Runs)

PK-Genesis uses curriculum learning β€” training data is organized from general to specialized, with decreasing learning rates:

Run Data Records LR Status
1 EN core pharmacy (part 1) ~9.6K 2e-4 Completed
2 EN core pharmacy (part 2) ~9.6K 2e-4 Completed
3 EN extended (FDA/WHO/RxNorm, part 1) ~9.2K 1.5e-4 Pending
4 EN extended (part 2) ~9.2K 1.5e-4 Pending
5 EN extended (part 3) ~9.2K 1.5e-4 Pending
6 EN extended (part 4) ~9.2K 1.5e-4 Pending
7 Multilingual ZH/FR/RU/KO (part 1) ~7.8K 1e-4 Pending
8 Multilingual (part 2) ~7.8K 1e-4 Pending
9 Multilingual (part 3) ~7.8K 1e-4 Pending
10 Khmer + Identity + Safety ~5.5K 5e-5 Pending

Anti-forgetting: 20% English core data replayed in batches 2-4 to prevent catastrophic forgetting.

Training Data (70,000+ Records)

Source Records Description
EN Core Pharmacy ~19.2K Drug monographs, interactions, dosing, counseling
OpenFDA ~29.9K FDA adverse events, drug labels, recalls
WHO Essential Medicines 553 WHO model list with clinical guidance
RxNorm 1,309 Drug nomenclature and relationships
Chain-of-Thought Reasoning ~2K Step-by-step clinical reasoning scenarios
Safety & Disclaimers ~500 Refusal patterns, emergency recognition, scope awareness
Identity ~1.5K PK-Genesis identity and personality
Multilingual (ZH/FR/RU/KO) ~23.3K Translated pharmacy knowledge
Khmer Pharmacy ~5K Cambodia-specific pharmaceutical data

All training data is in chat/messages format compatible with the Gemma 3 chat template.

Usage

Requirements

pip install transformers peft torch accelerate bitsandbytes

Loading the Model

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
import torch

# Load base model in 4-bit
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
)

base_model = AutoModelForCausalLM.from_pretrained(
    "unsloth/medgemma-4b-it",
    quantization_config=bnb_config,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("unsloth/medgemma-4b-it")

# Load PK-Genesis adapter
model = PeftModel.from_pretrained(base_model, "pharmkulen/pk-genesis-medgemma-run03")
model.eval()

Chat Example

messages = [
    {"role": "user", "content": "What are the common side effects of metformin?"}
]

inputs = tokenizer.apply_chat_template(
    messages, tokenize=True, add_generation_prompt=True,
    return_tensors="pt", return_dict=True
).to(model.device)

# Gemma 3 requires token_type_ids
inputs["token_type_ids"] = torch.zeros_like(inputs["input_ids"])

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=512,
        temperature=0.7,
        do_sample=True,
    )

response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)

With Unsloth (Faster Inference)

from unsloth import FastVisionModel

model, tokenizer = FastVisionModel.from_pretrained(
    "unsloth/medgemma-4b-it",
    load_in_4bit=True,
)
model = PeftModel.from_pretrained(model, "pharmkulen/pk-genesis-medgemma-run03")
FastVisionModel.for_inference(model)

Intended Use

Primary Use Cases

  • Pharmacy management systems (drug info lookup, interaction checking)
  • Patient-facing medication counseling chatbots
  • Pharmacist decision support tools
  • Medical education and training aids
  • Multilingual pharmacy assistance in Southeast Asia

Out of Scope

  • Not for clinical diagnosis β€” PK-Genesis is a pharmacy assistant, not a diagnostic tool
  • Not a replacement for healthcare professionals β€” always consult qualified pharmacists/doctors
  • Not validated for life-critical decisions β€” do not rely on this model for emergency medical decisions

Limitations & Safety

  • This model may produce incorrect or outdated drug information
  • Always verify critical medical information with official sources (FDA, WHO, local formularies)
  • The model will attempt to refuse diagnostic requests and redirect to professionals
  • Khmer language performance is still limited at Run 3 (improves in Runs 7-10)
  • Vision capabilities (medicine label reading) are not yet fine-tuned

All Checkpoints

Run HuggingFace Repository
Run 1 pharmkulen/pk-genesis-medgemma-run01
Run 2 pharmkulen/pk-genesis-medgemma-run02 (this)
Run 3 **pharmkulen/pk-genesis-medgemma-run03
Run 4-10 Coming soon

About PharmKulen

PharmKulen is an AI-powered pharmacy management and medicine search platform serving 120+ pharmacies across Cambodia. We help patients find medicines at nearby pharmacies with real-time availability in 6 languages, and provide pharmacy owners with digital tools for inventory, sales, and AI-assisted operations.

Contact: contact@pharmkulen.com Website: pharmkulen.com

Citation

@misc{pk-genesis-medgemma-2026,
  title={PK-Genesis: Domain-Specific Pharmaceutical AI Fine-Tuned from MedGemma 4B},
  author={Salakhitdinov, Khidayotullo},
  year={2026},
  publisher={HuggingFace},
  url={https://huggingface.co/pharmkulen/pk-genesis-medgemma-run02}
}
Downloads last month
13
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for pharmkulen/pk-genesis-medgemma-run02

Adapter
(12)
this model