Learning from Reference Answers: Versatile Language Model Alignment without Binary Human Preference Data
Paper
• 2504.09895 • Published
• 1
Datasets and models in: Learning from Reference Answers: Versatile Language Model Alignment without Binary Human Preference Data.