-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Insights: huggingface/trl
September 21, 2024 – September 28, 2024
Overview
Could not load contribution data
Please try again later
1 Release published by 1 person
-
v0.11.1
published
Sep 24, 2024
22 Pull requests merged by 5 people
-
Rename
dpo_visual.py
example todpo_vlm.py
#2139 merged
Sep 27, 2024 -
🃏 Model card for TRL
#2123 merged
Sep 27, 2024 -
Add correct label for
WinRateCallback
table#2134 merged
Sep 27, 2024 -
arXiv to HF Papers
#2133 merged
Sep 27, 2024 -
🧹 Style
#2132 merged
Sep 26, 2024 -
Add table for WinRateCallback
#2116 merged
Sep 26, 2024 -
♻️ Standardize
script_args
#2130 merged
Sep 26, 2024 -
Tokenize row during in
training_step
#2117 merged
Sep 26, 2024 -
Eos token encouragement Clarification
#2128 merged
Sep 26, 2024 -
Standardize pushing to Hub in examples
#2126 merged
Sep 26, 2024 -
Remove
max_length
fromRewardDataCollatorWithPadding
#2119 merged
Sep 26, 2024 -
Update example_overview.md
#2125 merged
Sep 25, 2024 -
Generalizes VSFT script to support REDACTED
#2120 merged
Sep 25, 2024 -
BCOTrainer
conversational dataset support#2107 merged
Sep 24, 2024 -
Fix pack test
#2111 merged
Sep 24, 2024 -
[online-dpo] allow parse-args as list of floats
#2108 merged
Sep 24, 2024 -
fix formatting
#2109 merged
Sep 24, 2024 -
Fix documentation links
#2105 merged
Sep 24, 2024 -
[RewardTrainer] Tokenize inputs within trainer
#2102 merged
Sep 24, 2024 -
[CLI]
trl env
for printing system info#2104 merged
Sep 24, 2024 -
Fix PPO/RLOO examples
#2100 merged
Sep 23, 2024 -
Clean up README and remove openrlbenchmark dependency
#2085 merged
Sep 23, 2024
5 Pull requests opened by 5 people
-
Fix RLOO checkpointing
#2114 opened
Sep 24, 2024 -
[SCoRE] initial score stage 1
#2115 opened
Sep 24, 2024 -
[DRAFT] Process-supervised RM Trainer
#2127 opened
Sep 26, 2024 -
DPO trainer supports num_logits_to_keep to save memory
#2129 opened
Sep 26, 2024 -
Conversational dataset support for `DPOTrainer`
#2131 opened
Sep 26, 2024
9 Issues closed by 7 people
-
Multiple Processes Spawning As A Result of SFTTrainer
#2084 closed
Sep 27, 2024 -
Policy has no attribute 'zero_gather_16bit_weights_on_model_save'
#2122 closed
Sep 27, 2024 -
[WinRateCallback] Log table of completions to WandB
#2099 closed
Sep 26, 2024 -
RewardTrainer warns about max_length
#2118 closed
Sep 26, 2024 -
DDPO job with Accelerator fails in a multi-gpu node
#2090 closed
Sep 25, 2024 -
xpo can not work
#2106 closed
Sep 24, 2024 -
https://github.com/huggingface/trl/blob/main/examples/notebooks/best_of_n.ipynb
#2088 closed
Sep 24, 2024 -
Group Relative Policy Optimization Trainer
#1583 closed
Sep 24, 2024 -
Support for more trainers in CLI
#1811 closed
Sep 23, 2024
12 Issues opened by 8 people
-
`SFTTrainer` Raises NotImplementedError with `IterableDataset`
#2138 opened
Sep 27, 2024 -
Use `unittest`'s methods for the tests
#2137 opened
Sep 27, 2024 -
[SFT VLM] Add support for Molmo models
#2136 opened
Sep 27, 2024 -
SFT_vlm script is missing the chat template
#2135 opened
Sep 27, 2024 -
RLOO generating checkpoints every 2 steps
#2124 opened
Sep 25, 2024 -
[RewardTrainer] Change print_rich_table parameters during Reward Model training
#2121 opened
Sep 25, 2024 -
Diffusion model generating identical images after DDPO fine-tuning
#2113 opened
Sep 24, 2024 -
[Data] Implement dataset mixer for combining datasets in training
#2112 opened
Sep 24, 2024 -
[Reward Modelling] Add support for process / stepwise supervision
#2110 opened
Sep 24, 2024 -
GRPO as part of HF TRL?
#2103 opened
Sep 23, 2024 -
[CLI] Extend training support to all trainers
#2101 opened
Sep 23, 2024 -
Supports of PPOTrainer / DPOTrainer for Qwen2Audio
#2097 opened
Sep 22, 2024
12 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
feat: add support for packing tokenized datasets
#2011 commented on
Sep 28, 2024 • 3 new comments -
RLOOTrainer & PPOv2Trainer - Modify Name for W&B Logged Table
#2045 commented on
Sep 23, 2024 • 0 new comments -
PPOV2 Trainner use Deepspeed Zero3 Offload CPU: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cpu!
#1891 commented on
Sep 23, 2024 • 0 new comments -
Support KTO for MLLM
#2091 commented on
Sep 23, 2024 • 0 new comments -
No v_head weight is found
#2095 commented on
Sep 24, 2024 • 0 new comments -
Deepspeed Zero2 not working when using DPOTrainer
#2062 commented on
Sep 24, 2024 • 0 new comments -
[Tracking issue] General dataset support
#2071 commented on
Sep 27, 2024 • 0 new comments -
[DRAFT] Vllm integration
#1628 commented on
Sep 23, 2024 • 0 new comments -
Prototype Dataset Processor
#1646 commented on
Sep 23, 2024 • 0 new comments -
Add simplified version of BCO loss
#1731 commented on
Sep 24, 2024 • 0 new comments -
added initial TPO implementation
#1965 commented on
Sep 25, 2024 • 0 new comments