FLUX.2-dev + Turbo (Merged) - GGUF[Q4_K_M]

Overview

This repository contains a merged and quantized version of the 32B FLUX.2-dev model with the Turbo LoRA baked directly into the UNET weights. It is quantized to Q4_K_M format using a custom llama.cpp build specifically patched for the FLUX architecture.

Why use this merged version?

If you use a base GGUF model and apply a LoRA node dynamically in ComfyUI, the engine applies "lowvram patches" (dequantizing layers on the fly during generation). For a 32B model, this drastically kills generation speed, even if the model fits in your VRAM.

By using this pre-merged GGUF file:

  • 0 LowVRAM patches applied.
  • The model fits entirely into 24GB VRAM (full load: True on RTX 3090/4090).
  • Maximizes performance when paired with SageAttention or FlashAttention.

File Details

  • Base Model: FLUX.2-dev (32B parameters)
  • LoRA Applied: FLUX.2-dev-Turbo (Weight: 1.0)
  • Format: GGUF
  • Quantization: Q4_K_M
  • File Size: ~17.8 GB

How to use in ComfyUI

  1. Download the flux2-dev-turbo-Q4_K_M.gguf file.
  2. Place it in your ComfyUI/models/unet/ directory.
  3. Use the Unet Loader GGUF node to load the model.
  4. ⚠️ IMPORTANT: DO NOT use a Load LoRA node for the Turbo LoRA. The weights are already baked into this UNET. Just connect it straight to your Sampler.
  5. Setup your Sampler for Turbo (e.g., 8 steps, CFG 1.0-1.5, depending on the Turbo LoRA requirements).

Hardware Requirements

  • RAM: 32 GB+ recommended
  • VRAM: ~24 GB (Fits comfortably on RTX 3090 / 4090 alongside a quantized Mistral-3 encoder).

Credits & Acknowledgements

Note: Please adhere to the original FLUX.2-dev non-commercial license when using this model.

Downloads last month
92
GGUF
Model size
32B params
Architecture
flux
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for UDream/flux2-dev-turbo-Q4_K_M

Quantized
(9)
this model