ORPO: Preference Optimization without the Supervised Fine-tuning (SFT) Step


A much cheaper alignment method performing as well as DPO

Recent Articles

Related Stories

Leave A Reply

Please enter your comment!
Please enter your name here