The one done with GPTQ seems better than this one.
Hi,
on my tests, the https://huggingface.co/dolfsai/Qwen3-Reranker-4B-seq-cls-vllm-W4A16 (GPTQ) one produced more similar results to the unquantized one than this one here (AWQ).
So make sure to run your tests with both or if not and in doubt use the GPTQ one instead of this.
Hope this helps, have fun!
interesting results! regards, AWQ is faster than GPTQ, and in some models AWQ outperforms GPTQ, this is not the case I think
Since these are rerankers, maybe some issue with the calibration datasets or llm-compressor mappings, etc.
It would be interesting to run some benchmarks on these and see the numbers if they match my manual testing of course, but I am overloaded with work and a 7 month baby when I am not working so I won't be able to do this soon unfortunately.
Thanks a lot for doing this work!!!
Since these are rerankers, maybe some issue with the calibration datasets or llm-compressor mappings, etc.
It would be interesting to run some benchmarks on these and see the numbers if they match my manual testing of course, but I am overloaded with work and a 7 month baby when I am not working so I won't be able to do this soon unfortunately.
Thanks a lot for doing this work!!!
Thanks for the comments, FYI both versions was quantized with the same process/map/calibration dataset only changed the receipt form awt to gptq, also gptq was do it with smooth quant pass (may be that make de difference)