Qwen3-4B-SafeRL - GGUF Quantized Versions

This repository provides GGUF quantized versions of Qwen/Qwen3-4B-SafeRL, converted with llama.cpp.

The base model was first exported from Hugging Face format to GGUF (FP16) and then quantized into multiple formats. These variants offer different trade-offs between model size, inference speed, and output quality.


πŸ”§ Model Details

  • Base model: Qwen/Qwen3-4B-SafeRL
  • Architecture: Qwen 3 (4B parameters)
  • Format: GGUF
  • Intended use: Safe RL research & alignment tasks
  • Conversion tool: convert_hf_to_gguf.py (from llama.cpp)
  • Quantization tool: llama-quantize

πŸ“Š Quantized Versions

Quantization Filename Size (GiB) Notes
FP16 Qwen3-4B-SafeRL-FP16.gguf ~8.05 Full precision (baseline)
Q2_K Qwen3-4B-SafeRL-Q2_K.gguf ~1.67 Smallest, lowest accuracy
Q3_K_M Qwen3-4B-SafeRL-Q3_K_M.gguf ~2.08 Balanced small size
Q4_0 Qwen3-4B-SafeRL-Q4_0.gguf ~2.37 Good balance, faster
Q4_K_M Qwen3-4B-SafeRL-Q4_K_M.gguf ~2.50 Standard, widely used
Q5_K_M Qwen3-4B-SafeRL-Q5_K_M.gguf ~2.89 Better accuracy
Q6_K Qwen3-4B-SafeRL-Q6_K.gguf ~3.31 High accuracy
Q8_0 Qwen3-4B-SafeRL-Q8_0.gguf ~4.28 Near FP16 quality

πŸš€ Usage

πŸ–₯️ llama.cpp

./main -m Qwen3-4B-SafeRL-Q4_K_M.gguf -p "Hello, SafeRL!"

🐍 Python

from huggingface_hub import hf_hub_download
from llama_cpp import Llama

model_path = hf_hub_download(
    repo_id="YOUR_USERNAME/Qwen3-4B-SafeRL-GGUF",
    filename="Qwen3-4B-SafeRL-Q4_K_M.gguf"
)

llm = Llama(model_path=model_path)

output = llm.create_chat_completion(
    messages=[
        {"role": "system", "content": "You are a safe RL assistant."},
        {"role": "user", "content": "Hello, SafeRL!"}
    ],
    max_tokens=100
)

print(output["choices"][0]["message"]["content"])

These GGUF versions are optimized for fast inference with CPU/GPU runtimes like llama.cpp, Ollama, and LM Studio.

Downloads last month
34
GGUF
Model size
4B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for ShahzebKhoso/Qwen3-4B-SafeRL-GGUF

Base model

Qwen/Qwen3-4B-Base
Finetuned
Qwen/Qwen3-4B
Quantized
(6)
this model