dsikka's picture
Update README.md
3f56f53 verified
metadata
base_model:
  - Qwen/Qwen3.5-397B-A17B
tags:
  - qwen
  - fp8
  - vllm
  - compressed-tensors
name: RedHatAI/Qwen3.5-397B-A17B-FP8-dynamic

FP8 Quantized Qwen3.5-397B-A17B

This is a preliminary version (and subject to change) of FP8 quantized Qwen/Qwen3.5-397B-A17B model. The model has both weights and activations quantized to FP8 format with vllm-project/llm-compressor.

It is compatible and tested against vllm main. Deploy it with: vllm serve RedHatAI/Qwen3.5-397B-A17B-FP8-dynamic.

Preliminary Evaluations

  1. GSM8k via vLLM's tests/evals/gsm8k/gsm8k_eval.py shows almost no degradation of accuracy:
Qwen/Qwen3.5-397B-A17B RedHatAI/Qwen3.5-397B-A17B-FP8-dynamic
(this model)
Accuracy 89.5 89.4
Recovery - 99.9%
  1. Under greedy sampling, the model generates almost identical text to the unquantized baseline. Qwen/Qwen3.5-397B-A17B is left, RedHatAI/Qwen3.5-397B-A17B-FP8-Dynamic is right:

image

Note: More rigorous evaluations are currently in progress and will be available soon.