Toyota Research Institute’s Post

Toyota Research Institute reposted this

View profile for Benjamin Burchfiel, graphic

Senior Manager and Lead - Embodied AI for Robots (LBM Team) @ Toyota Research Institute

Stanford, Berkeley, and TRI, along with collaborators from Google, MIT, and Physical Intelligence, have just released OpenVLA: a fully open-source (code, weights, and data) 7B Vision-Language-Action behavior model for robotics. Similar in approach to Google's RT-2-X, OpenVLA is trained on the Open X-Embodiment and DROID datasets, predicts actions directly from robot sensor input, and is deployable on several common manipulation platforms. I'm excited to see this work (and future work in the same vein) accelerate robotics research in the same way models pretrained on ImageNet led to rapid development in computer vision. By providing a base model to explore, experiment with, probe for limitations, iterate upon, and adapt to downstream tasks, OpenVLA will be a useful tool for further research. A few takeaways from this work: 1. Vision-Language Models (VLMs), even when trained without action data, are surprisingly effective base models for learning single skills. 2. Strong single-skill approaches (like DiffusionPolicy or ACT) perform very well within the distribution of their training data. If your test conditions are IID relative to your training data, fitting that data with an expressive model works well. The true promise of pretrained base models lies in robustness, generalization, and enabling graceful failure, as seen in other areas of machine learning. 3. Inference speed remains quite reasonable with 7B models. OpenVLA doesn't require cloud resources to deploy – it can be run locally on consumer GPUs at multiple inference cycles per second, even without optimization techniques like compilation or speculative decoding. There is still considerable potential for further improvement. Congratulations to lead authors Moo Jin Kim, Karl Pertsch, and Siddharth Karamcheti and other collaborators Ted Xiao, Ashwin Balakrishna, Suraj Nair, Rafael Rafailov, Ethan Foster, Grace Lam, Pannag Sanketi, Quan Vuong, Thomas Kollar, Benjamin Burchfiel, Russ Tedrake, Dorsa Sadigh, Sergey Levine, Percy Liang, and Chelsea Finn.

Benjamin Burchfiel

Senior Manager and Lead - Embodied AI for Robots (LBM Team) @ Toyota Research Institute

1mo
Pulkit Gaindhar

Accelerating Mobility with Tech @ Berylls by AlixPartners

1mo

This is huge! What I’m curious about is whether feedback and data from various applications, smaller research labs, or hobbyists can be integrated back into the development cycle. This could create a pretty good edge-based solution!

Hemang Purohit

Roboticist | Spatial AI | Embodied AI

1mo

This is great, Thanks for sharing, is there similar open source VLA model for autonomous navigation like GOAT ? Thank you

Like
Reply
Valentin Hendrik

Specialist for Robotics and AI at Schaeffler New Production Concepts

1mo
Charbel Dalely Tawk

Assistant Professor ∙ Engineering Consultant ∙ Mechanical & Robotics Engineer ∙ PhD in Soft Robotics

1mo
Myoungkyu Seo

Mechanical Engineering @ UT Austin

1mo
Like
Reply
Sandy Hefftz

Tech Leader & Business Strategist | From Moon Missions to Profitable Innovations at Amazon | Ex Amazon | Ex SpaceIL | 40 under 40

1mo
See more comments

To view or add a comment, sign in

Explore topics