Pulse · huggingface/trl · GitHub

September 21, 2024 – September 28, 2024

Overview

27 Active pull requests

21 Active issues

1 Release published by 1 person

v0.11.1
published Sep 24, 2024

22 Pull requests merged by 5 people

Rename dpo_visual.py example to dpo_vlm.py
#2139 merged Sep 27, 2024
🃏 Model card for TRL
#2123 merged Sep 27, 2024
Add correct label for WinRateCallback table
#2134 merged Sep 27, 2024
arXiv to HF Papers
#2133 merged Sep 27, 2024
🧹 Style
#2132 merged Sep 26, 2024
Add table for WinRateCallback
#2116 merged Sep 26, 2024
♻️ Standardize script_args
#2130 merged Sep 26, 2024
Tokenize row during in training_step
#2117 merged Sep 26, 2024
Eos token encouragement Clarification
#2128 merged Sep 26, 2024
Standardize pushing to Hub in examples
#2126 merged Sep 26, 2024
Remove max_length from RewardDataCollatorWithPadding
#2119 merged Sep 26, 2024
Update example_overview.md
#2125 merged Sep 25, 2024
Generalizes VSFT script to support REDACTED
#2120 merged Sep 25, 2024
BCOTrainer conversational dataset support
#2107 merged Sep 24, 2024
Fix pack test
#2111 merged Sep 24, 2024
[online-dpo] allow parse-args as list of floats
#2108 merged Sep 24, 2024
fix formatting
#2109 merged Sep 24, 2024
Fix documentation links
#2105 merged Sep 24, 2024
[RewardTrainer] Tokenize inputs within trainer
#2102 merged Sep 24, 2024
[CLI] trl env for printing system info
#2104 merged Sep 24, 2024
Fix PPO/RLOO examples
#2100 merged Sep 23, 2024
Clean up README and remove openrlbenchmark dependency
#2085 merged Sep 23, 2024

5 Pull requests opened by 5 people

Fix RLOO checkpointing
#2114 opened Sep 24, 2024
[SCoRE] initial score stage 1
#2115 opened Sep 24, 2024
[DRAFT] Process-supervised RM Trainer
#2127 opened Sep 26, 2024
DPO trainer supports num_logits_to_keep to save memory
#2129 opened Sep 26, 2024
Conversational dataset support for `DPOTrainer`
#2131 opened Sep 26, 2024

9 Issues closed by 7 people

Multiple Processes Spawning As A Result of SFTTrainer
#2084 closed Sep 27, 2024
Policy has no attribute 'zero_gather_16bit_weights_on_model_save'
#2122 closed Sep 27, 2024
[WinRateCallback] Log table of completions to WandB
#2099 closed Sep 26, 2024
RewardTrainer warns about max_length
#2118 closed Sep 26, 2024
DDPO job with Accelerator fails in a multi-gpu node
#2090 closed Sep 25, 2024
xpo can not work
#2106 closed Sep 24, 2024
https://github.com/huggingface/trl/blob/main/examples/notebooks/best_of_n.ipynb
#2088 closed Sep 24, 2024
Group Relative Policy Optimization Trainer
#1583 closed Sep 24, 2024
Support for more trainers in CLI
#1811 closed Sep 23, 2024

12 Issues opened by 8 people

`SFTTrainer` Raises NotImplementedError with `IterableDataset`
#2138 opened Sep 27, 2024
Use `unittest`'s methods for the tests
#2137 opened Sep 27, 2024
[SFT VLM] Add support for Molmo models
#2136 opened Sep 27, 2024
SFT_vlm script is missing the chat template
#2135 opened Sep 27, 2024
RLOO generating checkpoints every 2 steps
#2124 opened Sep 25, 2024
[RewardTrainer] Change print_rich_table parameters during Reward Model training
#2121 opened Sep 25, 2024
Diffusion model generating identical images after DDPO fine-tuning
#2113 opened Sep 24, 2024
[Data] Implement dataset mixer for combining datasets in training
#2112 opened Sep 24, 2024
[Reward Modelling] Add support for process / stepwise supervision
#2110 opened Sep 24, 2024
GRPO as part of HF TRL?
#2103 opened Sep 23, 2024
[CLI] Extend training support to all trainers
#2101 opened Sep 23, 2024
Supports of PPOTrainer / DPOTrainer for Qwen2Audio
#2097 opened Sep 22, 2024

12 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

feat: add support for packing tokenized datasets
#2011 commented on Sep 28, 2024 • 3 new comments
I'm closing this issue because no one has provided any code to reproduce the error. If you think you are facing the same problem, please open another issue (link this one) with a precise description and a code to reproduce the error.
#2021 commented on Sep 23, 2024 • 0 new comments
RLOOTrainer & PPOv2Trainer - Modify Name for W&B Logged Table
#2045 commented on Sep 23, 2024 • 0 new comments
PPOV2 Trainner use Deepspeed Zero3 Offload CPU: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cpu!
#1891 commented on Sep 23, 2024 • 0 new comments
Support KTO for MLLM
#2091 commented on Sep 23, 2024 • 0 new comments
No v_head weight is found
#2095 commented on Sep 24, 2024 • 0 new comments
Deepspeed Zero2 not working when using DPOTrainer
#2062 commented on Sep 24, 2024 • 0 new comments
[Tracking issue] General dataset support
#2071 commented on Sep 27, 2024 • 0 new comments
[DRAFT] Vllm integration
#1628 commented on Sep 23, 2024 • 0 new comments
Prototype Dataset Processor
#1646 commented on Sep 23, 2024 • 0 new comments
Add simplified version of BCO loss
#1731 commented on Sep 24, 2024 • 0 new comments
added initial TPO implementation
#1965 commented on Sep 25, 2024 • 0 new comments