/
lemm.ee
Search
Explore
Create
Machine Learning - Theory | Research
manitcor
•
1y ago
•
100%
Direct Preference Optimization - Your Language Model is Secretly a Reward Model
https://arxiv.org/pdf/2305.18290.pdf
1
0
Comments
0
Hot
Top
New