QuantTrio/Qwen3.5-397B-A17B-AWQ reponse is !
After the model is launched, all outputs returned by accessing it are exclamation marks (!). What could be the cause of this issue? The a10-awq model works normally in this environment.
device a800
ONTEXT_LENGTH=32768
export CUDA_VISIBLE_DEVICES="0,1,2,3"
vllm serve
//models/Qwen3.5/Qwen3.5-397B-A17B-AWQ
--served-model-name Qwen3.5-397B-A17B-AWQ
--enable-expert-parallel
--swap-space 16
--max-num-seqs 32
--max-model-len $CONTEXT_LENGTH
--gpu-memory-utilization 0.9
--tensor-parallel-size 4
--reasoning-parser qwen3
--mm-processor-cache-type shm
--mm-encoder-tp-mode data
--enable-prefix-caching
--host 0.0.0.0
--port 8086 \
https://huggingface.co/QuantTrio/Qwen3.5-397B-A17B-AWQ/discussions/3
double check if you're not using cuda 12.8/13.0
I'm using Cuda 13.0 and am getting "!!!!!" , I cleared the cache and still get nothing but !!!!!!!!>
I'm using Cuda 13.0 and am getting "!!!!!" , I cleared the cache and still get nothing but !!!!!!!!>
did you use the docker image same as others?
I was using vllm/vllm-openai:cu130-nightly
https://huggingface.co/QuantTrio/Qwen3.5-397B-A17B-AWQ/discussions/3
double check if you're not using cuda 12.8/13.0
my torch cuda version is 12.8;
torch.version.cuda
'12.8'
Please download / replace with the new config.json file from this repo, and have a try one more time. Let me know if this can help resolve the issue.
Please download / replace with the new
config.jsonfile from this repo, and have a try one more time. Let me know if this can help resolve the issue.
this config.json is ok!