Stars
A repository accompanying the PARTNR benchmark for using Large Planning Models (LPMs) to solve Human-Robot Collaboration or Robot Instruction Following tasks in the Habitat simulator.
MichalZawalski / embodied-CoT
Forked from openvla/openvlaEmbodied Chain of Thought: A robotic policy that reason to solve the task.
Heterogeneous Pre-trained Transformer (HPT) as Scalable Policy Learner.
DynaMo: In-Domain Dynamics Pretraining for Visuo-Motor Control
Octo is a transformer-based robot policy trained on a diverse mix of 800k robot trajectories.
[ECCV 2024] Beyond MOT: Semantic Multi-Object Tracking
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
[NeurIPS 2024] A Generalizable World Model for Autonomous Driving
[CVPR 2024 Highlight] GenAD: Generalized Predictive Model for Autonomous Driving & Foundation Models in Autonomous System
Painter & SegGPT Series: Vision Foundation Models from BAAI
[CVPR 2024] Official implementation of the paper "Visual In-context Learning"
[CVPR 2024 Highlight] Putting the Object Back Into Video Object Segmentation
[CVPR 2023 Best Paper Award] Planning-oriented Autonomous Driving
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
A TensorFlow implementation of Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures.
Starter Kit for NeurIPS 2020 - Procgen Competition on AIcrowd
Video Generation, Physical Commonsense, Semantic Adherence, VideoCon-Physics
Semi-Supervised Offline Reinforcement Learning with Action-Free Trajectories
Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos
[ECCV2024] VideoMamba: State Space Model for Efficient Video Understanding
PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO
Code for the ICLR 2024 spotlight paper: "Learning to Act without Actions" (introducing Latent Action Policies)
Suite of human-collected datasets and a multi-task continuous control benchmark for open vocabulary visuolinguomotor learning.
Official repository of Learning to Act from Actionless Videos through Dense Correspondences.
Transformers with Arbitrarily Large Context
Large World Model -- Modeling Text and Video with Millions Context
This repo contains the code for 1D tokenizer and generator
[ICCV 2023] MeMOTR: Long-Term Memory-Augmented Transformer for Multi-Object Tracking
[ECCV2022] PETR: Position Embedding Transformation for Multi-View 3D Object Detection & [ICCV2023] PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images