Qwen3-4B-SafeRL - GGUF Quantized Versions

This repository provides GGUF quantized versions of Qwen/Qwen3-4B-SafeRL, converted with llama.cpp.

The base model was first exported from Hugging Face format to GGUF (FP16) and then quantized into multiple formats. These variants offer different trade-offs between model size, inference speed, and output quality.

🔧 Model Details

Base model: Qwen/Qwen3-4B-SafeRL
Architecture: Qwen 3 (4B parameters)
Format: GGUF
Intended use: Safe RL research & alignment tasks
Conversion tool: convert_hf_to_gguf.py (from llama.cpp)
Quantization tool: llama-quantize

📊 Quantized Versions

Quantization	Filename	Size (GiB)	Notes
FP16	`Qwen3-4B-SafeRL-FP16.gguf`	~8.05	Full precision (baseline)
Q2_K	`Qwen3-4B-SafeRL-Q2_K.gguf`	~1.67	Smallest, lowest accuracy
Q3_K_M	`Qwen3-4B-SafeRL-Q3_K_M.gguf`	~2.08	Balanced small size
Q4_0	`Qwen3-4B-SafeRL-Q4_0.gguf`	~2.37	Good balance, faster
Q4_K_M	`Qwen3-4B-SafeRL-Q4_K_M.gguf`	~2.50	Standard, widely used
Q5_K_M	`Qwen3-4B-SafeRL-Q5_K_M.gguf`	~2.89	Better accuracy
Q6_K	`Qwen3-4B-SafeRL-Q6_K.gguf`	~3.31	High accuracy
Q8_0	`Qwen3-4B-SafeRL-Q8_0.gguf`	~4.28	Near FP16 quality

🚀 Usage

🖥️ llama.cpp

./main -m Qwen3-4B-SafeRL-Q4_K_M.gguf -p "Hello, SafeRL!"

🐍 Python

from huggingface_hub import hf_hub_download
from llama_cpp import Llama

model_path = hf_hub_download(
    repo_id="YOUR_USERNAME/Qwen3-4B-SafeRL-GGUF",
    filename="Qwen3-4B-SafeRL-Q4_K_M.gguf"
)

llm = Llama(model_path=model_path)

output = llm.create_chat_completion(
    messages=[
        {"role": "system", "content": "You are a safe RL assistant."},
        {"role": "user", "content": "Hello, SafeRL!"}
    ],
    max_tokens=100
)

print(output["choices"][0]["message"]["content"])

These GGUF versions are optimized for fast inference with CPU/GPU runtimes like llama.cpp, Ollama, and LM Studio.

Downloads last month: 34

GGUF

Model size

4B params

Architecture

qwen3

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

View +1 variant

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ShahzebKhoso/Qwen3-4B-SafeRL-GGUF

Base model

Qwen/Qwen3-4B-Base

Finetuned

Qwen/Qwen3-4B

Finetuned

Qwen/Qwen3-4B-SafeRL

Quantized

(6)

this model