gemma-4-E2B-it — 4-bit NF4 Quantized

4-bit NF4 quantized version of google/gemma-4-E2B-it.

Audio, video, and image inputs are fully preserved. Only the LLM backbone linear layers are quantized. Audio and image encoders remain at bfloat16.

Quantization details

bnb_4bit_quant_type: nf4
bnb_4bit_use_double_quant: True
bnb_4bit_compute_dtype: bfloat16

Usage

from transformers import AutoProcessor, AutoModelForImageTextToText
import torch

model = AutoModelForImageTextToText.from_pretrained(
    "derkar00/gemma-4-E2B-it-4bit-nf4",
    device_map="auto",
    torch_dtype=torch.bfloat16,
)
processor = AutoProcessor.from_pretrained("derkar00/gemma-4-E2B-it-4bit-nf4")

Downloads last month: 47

Safetensors

Model size

5B params

Tensor type

F32

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for derkar00/gemma-4-E2B-it-4bit-nf4

Base model

google/gemma-4-E2B

Finetuned

google/gemma-4-E2B-it

Quantized

(191)

this model