Shield-Qwen3-1.7B-KD-TED-Qwen3-4B-SafeRL-OOB

Student: Qwen/Qwen3-1.7B Teacher: Qwen/Qwen3-4B-SafeRL KD method: TED Scenario: OOB (out-of-box — neither teacher nor student was fine-tuned on DIA-GUARD before KD)

Part of the DIA-GUARD dialect-aware safety classifier suite. This checkpoint is the result of distilling an off-the-shelf 4B/8B safety teacher into a smaller student using 50K dialect-stratified samples from the DIA-GUARD train split, and evaluating on the full 181,874-sample dialect holdout test.

Test Set Results

Metric	Value
Accuracy	0.5758
Precision	0.6299
Recall	0.5980
F1	0.5567
Test samples	181,874

Per-class breakdown

Class	Precision	Recall	F1	Support
safe	0.5219	0.8570	0.6487	83,140
unsafe	0.7379	0.3391	0.4646	98,734

Confusion matrix

	Predicted safe	Predicted unsafe
Actual safe	TN=71,247	FP=11,893
Actual unsafe	FN=65,258	TP=33,476

Training Setup


Method	TED
Teacher	Qwen/Qwen3-4B-SafeRL
Student base	Qwen/Qwen3-1.7B
Train data	50,000 dialect-stratified DIA-GUARD samples
Epochs	1
Framework	HuggingFace transformers + accelerate

How to use

from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("jsl5710/Shield-Qwen3-1.7B-KD-TED-Qwen3-4B-SafeRL-OOB")
model = AutoModelForCausalLM.from_pretrained("jsl5710/Shield-Qwen3-1.7B-KD-TED-Qwen3-4B-SafeRL-OOB", torch_dtype="bfloat16")

prompt = "Classify the following text as safe or unsafe.\n\nText: Hello\n\nAnswer:"
inputs = tokenizer(prompt, return_tensors="pt")
out = model.generate(**inputs, max_new_tokens=8)
print(tokenizer.decode(out[0], skip_special_tokens=True))

Downloads last month: 21

Safetensors

Model size

2B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jsl5710/Shield-Qwen3-1.7B-KD-TED-Qwen3-4B-SafeRL-OOB

Base model

Qwen/Qwen3-1.7B-Base

Finetuned

Qwen/Qwen3-1.7B

Finetuned

(613)

this model