The one done with GPTQ seems better than this one.

#1
by ibaldonl - opened

Hi,

on my tests, the https://huggingface.co/dolfsai/Qwen3-Reranker-4B-seq-cls-vllm-W4A16 (GPTQ) one produced more similar results to the unquantized one than this one here (AWQ).

So make sure to run your tests with both or if not and in doubt use the GPTQ one instead of this.

Hope this helps, have fun!

Dolfs AI org
edited Feb 7

interesting results! regards, AWQ is faster than GPTQ, and in some models AWQ outperforms GPTQ, this is not the case I think

Since these are rerankers, maybe some issue with the calibration datasets or llm-compressor mappings, etc.

It would be interesting to run some benchmarks on these and see the numbers if they match my manual testing of course, but I am overloaded with work and a 7 month baby when I am not working so I won't be able to do this soon unfortunately.

Thanks a lot for doing this work!!!

Dolfs AI org
edited Feb 18

Since these are rerankers, maybe some issue with the calibration datasets or llm-compressor mappings, etc.

It would be interesting to run some benchmarks on these and see the numbers if they match my manual testing of course, but I am overloaded with work and a 7 month baby when I am not working so I won't be able to do this soon unfortunately.

Thanks a lot for doing this work!!!

Thanks for the comments, FYI both versions was quantized with the same process/map/calibration dataset only changed the receipt form awt to gptq, also gptq was do it with smooth quant pass (may be that make de difference)

Sign up or log in to comment