Highlights
- Pro
Stars
SAPIEN Manipulation Skill Framework, a GPU parallelized robotics simulator and benchmark
GraspSplats: Efficient Manipulation with 3D Feature Splatting
🤗 LeRobot: Making AI for Robotics more accessible with end-to-end learning
All files for research proposal and bachelor thesis on Quantum Machine Learning at the University of KwaZulu-Natal in Durban, South Africa.
Transformer Explained Visually: Learn How LLM Transformer Models Work with Interactive Visualization
[ECCV 2024] Official implementation of the paper "Towards Latent Masked Image Modeling for Self-Supervised Visual Representation Learning"
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
Ongoing research training gaussian splatting at scale by distributed system
WildGaussians: 3D Gaussian Splatting In the Wild
Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
[RSS2024] Official implementation of "Hierarchical Open-Vocabulary 3D Scene Graphs for Language-Grounded Robot Navigation"
🌋👵🏻 Yo'LLaVA: Your Personalized Language and Vision Assistant
[COLM-2024] List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs
Open-Sora: Democratizing Efficient Video Production for All
Code release for "Cut and Learn for Unsupervised Object Detection and Instance Segmentation" and "VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation"
GenSim: Generating Robotic Simulation Tasks via Large Language Models
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
[CVPR 2024] Official implementation of the paper "Visual In-context Learning"
[ECCV2024] API code for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy
LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills
AI agent using GPT-4V(ision) capable of using a mouse/keyboard to interact with web UI
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks