PEFT
Safetensors
dpo
llama-3
llama-3-8b
lora
m2
trl
llama3-8b-dpo-lora / training_args.bin

Commit History

unsloth_gpt-oss-20b__proj-llama3-dpo-m2__data-trl-lib_hh-rlhf-helpful-base__beta-0.1__ebs-16__seed-42: DPO adapter upload (base: unsloth/gpt-oss-20b)
6e5550d
verified

AshwinKM2005 commited on