Shield-Qwen3-1.7B-KD-TED-Qwen3-4B-SafeRL-OOB
Student: Qwen/Qwen3-1.7B
Teacher: Qwen/Qwen3-4B-SafeRL
KD method: TED
Scenario: OOB (out-of-box — neither teacher nor student was fine-tuned on DIA-GUARD before KD)
Part of the DIA-GUARD dialect-aware safety classifier suite. This checkpoint is the result of distilling an off-the-shelf 4B/8B safety teacher into a smaller student using 50K dialect-stratified samples from the DIA-GUARD train split, and evaluating on the full 181,874-sample dialect holdout test.
Test Set Results
| Metric | Value |
|---|---|
| Accuracy | 0.5758 |
| Precision | 0.6299 |
| Recall | 0.5980 |
| F1 | 0.5567 |
| Test samples | 181,874 |
Per-class breakdown
| Class | Precision | Recall | F1 | Support |
|---|---|---|---|---|
| safe | 0.5219 | 0.8570 | 0.6487 | 83,140 |
| unsafe | 0.7379 | 0.3391 | 0.4646 | 98,734 |
Confusion matrix
| Predicted safe | Predicted unsafe | |
|---|---|---|
| Actual safe | TN=71,247 | FP=11,893 |
| Actual unsafe | FN=65,258 | TP=33,476 |
Training Setup
| Method | TED |
| Teacher | Qwen/Qwen3-4B-SafeRL |
| Student base | Qwen/Qwen3-1.7B |
| Train data | 50,000 dialect-stratified DIA-GUARD samples |
| Epochs | 1 |
| Framework | HuggingFace transformers + accelerate |
How to use
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("jsl5710/Shield-Qwen3-1.7B-KD-TED-Qwen3-4B-SafeRL-OOB")
model = AutoModelForCausalLM.from_pretrained("jsl5710/Shield-Qwen3-1.7B-KD-TED-Qwen3-4B-SafeRL-OOB", torch_dtype="bfloat16")
prompt = "Classify the following text as safe or unsafe.\n\nText: Hello\n\nAnswer:"
inputs = tokenizer(prompt, return_tensors="pt")
out = model.generate(**inputs, max_new_tokens=8)
print(tokenizer.decode(out[0], skip_special_tokens=True))
- Downloads last month
- 21
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support