Eval Requests

#587
by MuXodious - opened

I have hereticated Qwen3.5 4B with two sets of refusal markers and three distinct abliteration techniques to demonstrate and understand the effects of each approach.

Unversioned or v1 targets only primary refusals.
v2 model targets both primary refusals and secondary factors.

MPOA only
https://huggingface.co/MuXodious/Qwen3.5-4B-PaperWitch-heresy
https://huggingface.co/MuXodious/Qwen3.5-4B-PaperWitch-heresy-v2

ARA
NOTE: ARA is in-development at the stage of a draft as a PR.
https://huggingface.co/MuXodious/Qwen3.5-4B-ARA-heresy
https://huggingface.co/MuXodious/Qwen3.5-4B-ARA-heresy-v2

SOMA + MPOA
NOTE: SOMA is in-development, and the PR hasn't merged.
https://huggingface.co/MuXodious/Qwen3.5-4B-SOMPOA-heresy
https://huggingface.co/MuXodious/Qwen3.5-4B-SOMPOA-heresy-v2

BUMP

The collective results from all these will be valuable info

@darkc0de I plan to prepare some ministral and llama models with the same approach later to expand the available data on the matter. Qwen3.5 does not seem to me the best platform for demonstration. The model is just too weird. Still, they should provide valuable insights.

Sign up or log in to comment