Spaces:

DontPlanToEnd
/

UGI-Leaderboard

Running

App Files Files Community

577

Eval request: kabachuha/gpt-oss-20b-SOMbliterated

#571

by kabachuha - opened 3 days ago

Discussion

kabachuha

3 days ago

https://huggingface.co/kabachuha/gpt-oss-20b-SOMbliterated

gpt-oss-20b - SOMbliterated

This is a SOMbliterated (decensored) version of openai/gpt-oss-20b, made using Heretic v1.2.0 with Pull Request https://github.com/p-e-w/heretic/pull/196 adding multi-directional abliteration with the directions determined by trainable self-organizing neural networks. (Self-Organizing Maps / Kohonen networks)

They assume that in advanced recent neural network the refusal concept is not just a single direction, but a complex manifold, just like numbers and days of week are encoded in circles or helixes. Now, this manifold is eliminated more surgically, from multiple sides, providing precisional ablation instead of complete lobotomy.

The method is based on the amazing work https://arxiv.org/abs/2511.08379v2.

For this abliteration, in particular, there were used five directions.

Performance

Metric	This model	Original model (openai/gpt-oss-20b)
KL divergence	0.1166	0 (by definition)
Refusals	3/100	100/100

As of 2026-02-27 this is the lowest amount of oss-20b heretic refusals I've read on huggingface. See comparison with the other available models on Github

DontPlanToEnd changed discussion status to closed 2 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment