Skip to content

Issues: huggingface/trl

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt click/return to exclude labels
or click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Assignee
Filter by who’s assigned
Sort

Issues list

How to calculate the loss of multi-turn dialogue training data? ❓ question Seeking clarification or more information 🏋 SFT Related to SFT
#2424 opened Dec 2, 2024 by NUMB1234
precompute_ref_log_probs not working correctly? 🏋 DPO Related to DPO
#2423 opened Dec 2, 2024 by dakru012
7 of 9 tasks
Let DPOTrainer Support padding_free 🏋 DPO Related to DPO ✨ enhancement New feature or request 🧒 good second issue Good for contributors with basic project familiarity 🙋 help wanted Open invitation for community members to contribute
#2422 opened Dec 1, 2024 by fzyzcjy
Online DPO Quite Slow Compared with Previous Versions
#2416 opened Nov 29, 2024 by zcw0201
7 of 9 tasks
Online DPO Meets Error When Using Deepspeed for Speed Up. 🐛 bug Something isn't working 🚀 deepspeed Related to deepspeed 🏋 Online DPO Related to Online DPO
#2410 opened Nov 28, 2024 by zcw0201
7 of 9 tasks
RLOO Trainer do not support peft lora
#2404 opened Nov 28, 2024 by harvinyou
7 of 9 tasks
RLOO Trainer Stopping After 1 Epoch
#2401 opened Nov 27, 2024 by asparius
7 of 9 tasks
SFTTrainer usage ❓ question Seeking clarification or more information 🏋 SFT Related to SFT
#2390 opened Nov 25, 2024 by Humauaca
2 of 3 tasks
eos_token config in PPOTrainer ✨ enhancement New feature or request 👶 good first issue Good for newcomers 🏋 PPO Related to PPO
#2387 opened Nov 23, 2024 by kechunFIVE
adding DRO trainer ✨ enhancement New feature or request 🙋 help wanted Open invitation for community members to contribute
#2383 opened Nov 22, 2024 by morLev
2 of 3 tasks
DPO does not work for FIM task with non-instruct model 🏋 DPO Related to DPO ❓ question Seeking clarification or more information
#2382 opened Nov 22, 2024 by AML14
7 of 9 tasks
ValueError: Predictions and/or references don't match the expected format. 🐛 bug Something isn't working 🏋 SFT Related to SFT
#2376 opened Nov 20, 2024 by scarafoni
3 of 4 tasks
The DPO reward accuracy value is only 0 or 1 🏋 DPO Related to DPO ⏳ needs more info Additional information or clarification is required to proceed ❓ question Seeking clarification or more information
#2371 opened Nov 20, 2024 by carrot0117
PPO manual reward functions 🏋 PPO Related to PPO ❓ question Seeking clarification or more information
#2363 opened Nov 18, 2024 by schmidtj3
Contributing new distillation related trainers
#2361 opened Nov 16, 2024 by YihanCao123
1 of 3 tasks
How to train from scratch? Can you provide the code ❓ question Seeking clarification or more information
#2356 opened Nov 14, 2024 by sankexin
5 of 9 tasks
ProTip! Updated in the last three days: updated:>2024-11-29.