Highlights
- Pro
Stars
world modeling challenge for humanoid robots
A recipe for online RLHF and online iterative DPO.
Recipes to train reward model for RLHF.
Finetune Llama 3.2, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory
⚡️ Shockingly fast imitation learning algorithms via combining online and offline data engines. ⚡️
The official implementation of Self-Play Fine-Tuning (SPIN)
Imitation learning benchmark focusing on complex locomotion tasks using MuJoCo.
Contains JAX implementation of algorithms for inverse reinforcement learning
Learning Shared Safety Constraints from Multi-Task Demonstrations (NeurIPS 2023)
Train a language model to answer Slack messages as you.
🚀 A fast safe reinforcement learning library in PyTorch
KwaiRec: A Fully-observed Dataset for Recommender Systems.
Official repo for consistency models.
Code and documentation to train Stanford's Alpaca models, and generate the data.
A modern, high customizable, responsive Jekyll theme for documentation with built-in search.
A C 14-compatible physical units library with no dependencies and a single-file delivery option. Emphasis on safety, accessibility, performance, and developer experience.
yudasong / DQfD
Forked from felix-kerkhoff/DQfDAn implementation of Deep Q-Learning from Demonstrations (DQfD) for playing Atari 2600 video games
Course materials for Advanced Topics in Statistical Learning, Spring 2023
Simple (but often Strong) Baselines for POMDPs in PyTorch, ICML 2022
Differentiable Optimization-Based Modeling for Machine Learning
A data-driven, fast driving simulator for multi-agent coordination under partial observability.
Prototyping robots for PyBullet (F1/10 MIT Racecar, Sawyer, Baxter and Dobot arm, Boston Dynamics Atlas and Botlab environment)