Gemma-4-E2B-IT-SFT-RLVR-Medical

Gemma-4-E2B-it fine-tuned on PubMedQA using SFT and RLVR.
Also check out the training code on GitHub.

Setup

# !pip install llama-cpp-python
from llama_cpp import Llama

llm = Llama.from_pretrained(
    repo_id="lukasdrews/Gemma-4-E2B-IT-SFT-RLVR-Medical-GGUF",
    filename="gemma-4-E2B-it-sft-rlvr-medical-Q4_K_M.gguf",
    verbose=False,
)
messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "Do GEC produce and bear factor H under complement attack?"}
        ]
    },
]

outputs = llm.create_chat_completion(messages, max_tokens=1024)
print(outputs["choices"][0]["message"]["content"])

Benchmarks

Model Quantization PubMedQA
(In-Domain)
MedQA-USMLE
(Zero-Shot Transfer)
Gemma-4-E2B-it (base model) - 58.10 % 29.54 %
Gemma-4-E2B-it + SFT + RLVR - 73.10 % 43.05 %
Gemma-4-E2B-it + SFT + RLVR Q8_0 72.40 % 43.00 %
Gemma-4-E2B-it + SFT + RLVR Q6_K 72.10 % 42.18 %
Gemma-4-E2B-it + SFT + RLVR Q5_K_M 72.00 % 38.88 %
Gemma-4-E2B-it + SFT + RLVR Q4_K_M 71.80 % 38.88 %
Downloads last month
2,887
GGUF
Model size
5B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for lukasdrews/Gemma-4-E2B-IT-SFT-RLVR-Medical-GGUF

Quantized
(1)
this model

Dataset used to train lukasdrews/Gemma-4-E2B-IT-SFT-RLVR-Medical-GGUF

Collection including lukasdrews/Gemma-4-E2B-IT-SFT-RLVR-Medical-GGUF