QuantTrio/Qwen3.5-397B-A17B-AWQ reponse is !

#5
by duyuting - opened

After the model is launched, all outputs returned by accessing it are exclamation marks (!). What could be the cause of this issue? The a10-awq model works normally in this environment.

device a800

ONTEXT_LENGTH=32768
export CUDA_VISIBLE_DEVICES="0,1,2,3"
vllm serve
//models/Qwen3.5/Qwen3.5-397B-A17B-AWQ
--served-model-name Qwen3.5-397B-A17B-AWQ
--enable-expert-parallel
--swap-space 16
--max-num-seqs 32
--max-model-len $CONTEXT_LENGTH
--gpu-memory-utilization 0.9
--tensor-parallel-size 4
--reasoning-parser qwen3
--mm-processor-cache-type shm
--mm-encoder-tp-mode data
--enable-prefix-caching
--host 0.0.0.0
--port 8086 \

QuantTrio org

https://huggingface.co/QuantTrio/Qwen3.5-397B-A17B-AWQ/discussions/3

double check if you're not using cuda 12.8/13.0

I'm using Cuda 13.0 and am getting "!!!!!" , I cleared the cache and still get nothing but !!!!!!!!>

QuantTrio org

I'm using Cuda 13.0 and am getting "!!!!!" , I cleared the cache and still get nothing but !!!!!!!!>

did you use the docker image same as others?

I was using vllm/vllm-openai:cu130-nightly

https://huggingface.co/QuantTrio/Qwen3.5-397B-A17B-AWQ/discussions/3

double check if you're not using cuda 12.8/13.0

my torch cuda version is 12.8;

torch.version.cuda
'12.8'

Please download / replace with the new config.json file from this repo, and have a try one more time. Let me know if this can help resolve the issue.

Please download / replace with the new config.json file from this repo, and have a try one more time. Let me know if this can help resolve the issue.

this config.json is ok!

Sign up or log in to comment