Shield-Qwen3-1.7B-KD-TED-Qwen3-4B-SafeRL-OOB

Student: Qwen/Qwen3-1.7B Teacher: Qwen/Qwen3-4B-SafeRL KD method: TED Scenario: OOB (out-of-box — neither teacher nor student was fine-tuned on DIA-GUARD before KD)

Part of the DIA-GUARD dialect-aware safety classifier suite. This checkpoint is the result of distilling an off-the-shelf 4B/8B safety teacher into a smaller student using 50K dialect-stratified samples from the DIA-GUARD train split, and evaluating on the full 181,874-sample dialect holdout test.

Test Set Results

Metric Value
Accuracy 0.5758
Precision 0.6299
Recall 0.5980
F1 0.5567
Test samples 181,874

Per-class breakdown

Class Precision Recall F1 Support
safe 0.5219 0.8570 0.6487 83,140
unsafe 0.7379 0.3391 0.4646 98,734

Confusion matrix

Predicted safe Predicted unsafe
Actual safe TN=71,247 FP=11,893
Actual unsafe FN=65,258 TP=33,476

Training Setup

Method TED
Teacher Qwen/Qwen3-4B-SafeRL
Student base Qwen/Qwen3-1.7B
Train data 50,000 dialect-stratified DIA-GUARD samples
Epochs 1
Framework HuggingFace transformers + accelerate

How to use

from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("jsl5710/Shield-Qwen3-1.7B-KD-TED-Qwen3-4B-SafeRL-OOB")
model = AutoModelForCausalLM.from_pretrained("jsl5710/Shield-Qwen3-1.7B-KD-TED-Qwen3-4B-SafeRL-OOB", torch_dtype="bfloat16")

prompt = "Classify the following text as safe or unsafe.\n\nText: Hello\n\nAnswer:"
inputs = tokenizer(prompt, return_tensors="pt")
out = model.generate(**inputs, max_new_tokens=8)
print(tokenizer.decode(out[0], skip_special_tokens=True))
Downloads last month
21
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jsl5710/Shield-Qwen3-1.7B-KD-TED-Qwen3-4B-SafeRL-OOB

Finetuned
Qwen/Qwen3-1.7B
Finetuned
(613)
this model