The one done with GPTQ seems better than this one.

by ibaldonl - opened Feb 5

Feb 5

Hi,

on my tests, the https://huggingface.co/dolfsai/Qwen3-Reranker-4B-seq-cls-vllm-W4A16 (GPTQ) one produced more similar results to the unquantized one than this one here (AWQ).

So make sure to run your tests with both or if not and in doubt use the GPTQ one instead of this.

Hope this helps, have fun!

prudant

Dolfs AI org Feb 7

•

edited Feb 7

interesting results! regards, AWQ is faster than GPTQ, and in some models AWQ outperforms GPTQ, this is not the case I think

ibaldonl

Feb 9

Since these are rerankers, maybe some issue with the calibration datasets or llm-compressor mappings, etc.

It would be interesting to run some benchmarks on these and see the numbers if they match my manual testing of course, but I am overloaded with work and a 7 month baby when I am not working so I won't be able to do this soon unfortunately.

Thanks a lot for doing this work!!!

prudant

Dolfs AI org Feb 18

•

edited Feb 18

Since these are rerankers, maybe some issue with the calibration datasets or llm-compressor mappings, etc.

It would be interesting to run some benchmarks on these and see the numbers if they match my manual testing of course, but I am overloaded with work and a 7 month baby when I am not working so I won't be able to do this soon unfortunately.

Thanks a lot for doing this work!!!

Thanks for the comments, FYI both versions was quantized with the same process/map/calibration dataset only changed the receipt form awt to gptq, also gptq was do it with smooth quant pass (may be that make de difference)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment