Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding DRO trainer #2383

Open
2 of 3 tasks
morLev opened this issue Nov 22, 2024 · 1 comment
Open
2 of 3 tasks

adding DRO trainer #2383

morLev opened this issue Nov 22, 2024 · 1 comment
Labels
✨ enhancement New feature or request 🙋 help wanted Open invitation for community members to contribute

Comments

@morLev
Copy link

morLev commented Nov 22, 2024

Method description

Hello!
I’m considering implementing direct reward optimization (DRO) from this paper.
However, I'm unsure if it aligns with the contribution guidelines.

Here’s a comparison between this approach and KTO, demonstrating its superior performance:
image

Another advantage of this method is that it uses a dataset that assigns scores to each example, rather than relying on a pairwise dataset (like DPO).
from the DRO paper:
"Second and more importantly, annotating pairwise data is more expensive and less natural than simply indicating whether a single completion is satisfactory or not, e.g., by assigning a binary thumbs up or down rating to the model completion. "

Open source status

  • The method implementation is available
  • The model weights are available
  • The training datasets are available

Provide useful links for the implementation

paper: https://arxiv.org/pdf/2405.19107
weights: google-t5 (t5-large and t5-3b)
dataset: openbmb/UltraFeedback

@qgallouedec
Copy link
Member

Hi! Thanks for the suggestion. It could be a great addition. I haven't read the paper in detail yet but what you describe sounds closer to KTO than DPO, doesn't it?
Do you have an implementation that already works?

@qgallouedec qgallouedec added ✨ enhancement New feature or request 🙋 help wanted Open invitation for community members to contribute labels Nov 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
✨ enhancement New feature or request 🙋 help wanted Open invitation for community members to contribute
Projects
None yet
Development

No branches or pull requests

2 participants