QuantTrio/Qwen3.5-397B-A17B-AWQ reponse is !

by duyuting - opened 11 days ago

After the model is launched, all outputs returned by accessing it are exclamation marks (!). What could be the cause of this issue? The a10-awq model works normally in this environment.

device a800

ONTEXT_LENGTH=32768
export CUDA_VISIBLE_DEVICES="0,1,2,3"
vllm serve
//models/Qwen3.5/Qwen3.5-397B-A17B-AWQ
--served-model-name Qwen3.5-397B-A17B-AWQ
--enable-expert-parallel
--swap-space 16
--max-num-seqs 32
--max-model-len $CONTEXT_LENGTH
--gpu-memory-utilization 0.9
--tensor-parallel-size 4
--reasoning-parser qwen3
--mm-processor-cache-type shm
--mm-encoder-tp-mode data
--enable-prefix-caching
--host 0.0.0.0
--port 8086 \

tclf90

QuantTrio org 11 days ago

https://huggingface.co/QuantTrio/Qwen3.5-397B-A17B-AWQ/discussions/3

double check if you're not using cuda 12.8/13.0

JNBailey

11 days ago

I'm using Cuda 13.0 and am getting "!!!!!" , I cleared the cache and still get nothing but !!!!!!!!>

tclf90

QuantTrio org 11 days ago

I'm using Cuda 13.0 and am getting "!!!!!" , I cleared the cache and still get nothing but !!!!!!!!>

did you use the docker image same as others?

JNBailey

11 days ago

I was using vllm/vllm-openai:cu130-nightly

duyuting

10 days ago

https://huggingface.co/QuantTrio/Qwen3.5-397B-A17B-AWQ/discussions/3

double check if you're not using cuda 12.8/13.0

my torch cuda version is 12.8;

torch.version.cuda
'12.8'

tclf90

QuantTrio org 10 days ago

•

edited 10 days ago

Please download / replace with the new config.json file from this repo, and have a try one more time. Let me know if this can help resolve the issue.

duyuting

6 days ago

Please download / replace with the new config.json file from this repo, and have a try one more time. Let me know if this can help resolve the issue.

this config.json is ok！

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment