Spaces:

DontPlanToEnd
/

UGI-Leaderboard

Running

App Files Files Community

597

Eval Requests

#587

by MuXodious - opened 7 days ago

Discussion

MuXodious

7 days ago

•

edited 7 days ago

I have hereticated Qwen3.5 4B with two sets of refusal markers and three distinct abliteration techniques to demonstrate and understand the effects of each approach.

Unversioned or v1 targets only primary refusals.
v2 model targets both primary refusals and secondary factors.

MPOA only
https://huggingface.co/MuXodious/Qwen3.5-4B-PaperWitch-heresy
https://huggingface.co/MuXodious/Qwen3.5-4B-PaperWitch-heresy-v2

ARA
NOTE: ARA is in-development at the stage of a draft as a PR.
https://huggingface.co/MuXodious/Qwen3.5-4B-ARA-heresy
https://huggingface.co/MuXodious/Qwen3.5-4B-ARA-heresy-v2

SOMA + MPOA
NOTE: SOMA is in-development, and the PR hasn't merged.
https://huggingface.co/MuXodious/Qwen3.5-4B-SOMPOA-heresy
https://huggingface.co/MuXodious/Qwen3.5-4B-SOMPOA-heresy-v2

darkc0de

7 days ago

BUMP

The collective results from all these will be valuable info

MuXodious

7 days ago

•

edited 6 days ago

@darkc0de I plan to prepare some ministral and llama models with the same approach later to expand the available data on the matter. Qwen3.5 does not seem to me the best platform for demonstration. The model is just too weird. Still, they should provide valuable insights.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment