-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GRPO] initial GRPO trainer #1954
base: main
Are you sure you want to change the base?
Conversation
Thank you for working on this nifty algorithm @saisurbehera ! I see you're basing your implementation on Would you mind adapting your implementation to this new API? Since GRPO is somewhat similar to RLOO, you might find it is possible to copy-paste a large part of that code: https://github.com/huggingface/trl/blob/main/trl/trainer/rloo_trainer.py |
Sure, i can make the changes similar to PPOtrainerv2 |
Hello @lewtun , I ported the format to the new methodlogy, it was way simpler than the first version. I still have to do some validations and testing. |
Implementation of the DeepSeekMath GRPO: https://arxiv.org/pdf/2402.03300
Still a work in progress
closes #2103