PEFT
Safetensors
dpo
llama-3
llama-3-8b
lora
m2
trl
AshwinKM2005's picture
unsloth_gpt-oss-20b__proj-llama3-dpo-m2__data-trl-lib_hh-rlhf-helpful-base__beta-0.1__ebs-16__seed-42: DPO adapter upload (base: unsloth/gpt-oss-20b)
6e5550d verified