For the second week in a row, Harmonious' spotlight paper is about a new benchmark: BLINK: Multimodal Large Language Models Can See but Not Perceive. Its authors are with UPenn, U Washington, AI2, UC Davis, and Columbia U. BLINK is a benchmark containing 14 visual perception tasks that can be solved by humans “within a blink”, but pose significant challenges for current multimodal LLMs since they resist mediation through natural language (i.e. dense captioning). While humans get 96% accuracy, the best-performing GPT-4V, Gemini Pro, and Claude Opus achieve accuracies of 51%, 45%, and 43% respectively, not much better than random guessing (38%). This indicates that such perception abilities have not “emerged” yet in recent multimodal LLMs. Notably, for certain tasks some multimodal LLMs even underperform compared to random guessing. Specialist CV models could solve these problems much better. Read our analysis on Harmonious at https://lnkd.in/gTMwH72C where we discuss related topics such as the Moravec paradox, System 1 and 2, and the path to AGI in addition to recommendations for practitioners. Sign up at Harmonious.ai to never miss our weekly paper roundup! #harmonious #ai2incubator
AI2 Incubator’s Post
More Relevant Posts
-
At the beginning of this year, I wrote in Insight #13: """ 2024 Prediction: VoiceGPTs We wrap up Insight #13 with our prediction for 2024. Similar to many other 2024 predictions, we anticipate multimodal models to take center stage. We are particularly excited about models that combine text and speech modalities, enabling seamless end-to-end conversations that are voice-based. This is in contrast to the current pipelined approach of sandwiching an LLM with a pair of speech-to-text and text-to-speech models that results in highly stilted, walkie-talkie-like experience. Multimodal text and speech models, which we refer to as VoiceGPTs, will elevate the popular ChatGPT experience beyond the confines of the keyboard. Imagine having a natural conversation about any topic with a VoiceGPT on your Alexa, Siri, or Home device. This is a highly non-trivial technical challenge. We will only see a preview of such technology in 2024. """ Fast forward 9 months, I was proven to be a bit too cautious. First, OpenAI announced in May GPT-4o, with scifi-like demos evocative of the movie Her (and drawing the ire of Scarlett Johansson). GPT-4o's voice mode was rolled out to users earlier this month. OpenAI continues to show the way, leaving the rest of the industry (Anthropic, Google, Meta, etc.) scrambling to catch up. The company that is closest to catching up here is however a French AI research lab "with a $330 million budget that will make everything open source". It is called Kyutai. Last week they shared Moshi (model, weights for Moshi and its Mimi codec, streaming inference code in Pytorch, Rust and MLX, and a fantastic technical report). Amazing! Moshi's technical report is our pick for past week's spotlight paper at Harmonious. https://lnkd.in/gFEVEdTq #ai2incubator #harmonious
Weekly paper roundup: Moshi (9/16/2024)
harmonious.ai
To view or add a comment, sign in
-
Harmonious' spotlight paper this week: OLMoE: Open Mixture-of-Experts Language Models Authors: Allen Institute for AI; Contextual AI; University of Washington; Princeton University This paper presents OLMoE, an innovative language model leveraging a sparse Mixture-of-Experts architecture, which achieves remarkable efficiency and performance with its 7 billion parameters. I found the emphasis on key design choices and their detailed analysis of MoE training particularly insightful. The open-source nature of their work fosters transparency and collaboration in the AI community. However, the high computational resources required for pretraining may limit accessibility for many academic institutions. Lastly, I am curious about whether the outcomes observed in smaller models will hold true in significantly larger models. #harmoniou #ai2incubator
Weekly paper roundup: OLMoE (9/2/2024)
harmonious.ai
To view or add a comment, sign in
-
Harmonious' weekly paper roundup for the week of August 26, 2024. The reviewed papers collectively delve into various advancements in AI models, particularly focusing on multimodal, vision-language, and inference strategies. Several papers explore the enhancement of Large Language Models (LLMs) through innovative techniques such as improved inference patterns for long contexts, the utilization of mixed encoders, and energy-efficient on-device processing (WiM, Eagle, Dolphin). Another recurring theme is multimodality, with in-depth studies on optimizing LLMs for cross-modal alignment and real-time interactions in complex environments (Law of Vision Representation, GameNGen, CogVLM2). Further contributions include advancements in text-to-image diffusion models, audio language modeling, and AI-generated content in music, reflecting the expanding scope of AI applications (SwiftBrush v2, WavTokenizer, Foundation Models for Music). The practical impact of these models is underscored by initiatives to enhance the functionality and accessibility of benchmarks and operational pipelines, ensuring robust performance in real-world scenarios (SWE-bench-java, LlamaDuo, MME-RealWorld). https://lnkd.in/gw7za8-X #ai2incubator #harmonious
Weekly paper roundup: Writing in the Margins (8/26/2024)
harmonious.ai
To view or add a comment, sign in
-
About 3 years ago (Fall 2021) I wrote about LLMs and mused about the concept of task-centric AI as an emerging addition to Andrew Ng's initiative on data-centric AI: "What's with the brouhaha around LLMs? Learning efficiency! Below is the famous GPT-3 graph that got everyone's attention ... Instead of building 10 models with 1,000 labels per, we could build 1,000 models with 10 labels per ... In the task-centric world, LLMs could open up the opportunity to help less technical folks build and use AI models without relying on an expensive data science team. No-code AI, powered by the XXL transformers near you? Scale.ai and Snorkel.ai are the poster-child unicorns of the data-centric AI world. Who will emerge as the representatives for the LLM task-centric world?" While the term task-centric AI did not catch on—GenAI is the chosen one (I am neither Andrew Ng nor marketing expert), it essentially describes the AI world we have today. Is there something as transformative lurking on the horizon? I have not seen anything like it following recent AI research (and occasionally sharing thoughts on harmonious.ai). There will be one for sure, and let's be on the lookout. https://lnkd.in/gtQcm-cJ PS: The Yoodli team is on a tear. Go Esha Joshi & Varun Puri! #ai2incubator #startup #llm #genai #task_centric_AI
AI2 Incubator Technology Newsletter - October 2021
Vu Ha on LinkedIn
To view or add a comment, sign in
-
On Thursday, Materia AI emerged from stealth to create an invaluable AI assistant and workspace for accounting firms. As the US continues to grapple with its growing accountant shortage, providing teams with the tools they need to augment their workflows will be critical to relieving the pressures on the industry. You can check out the product at https://www.trymateria.ai/ Congratulations Kevin Merlini and Lucas Adams - we can’t wait to see what the future holds! TechCrunch article here: https://lnkd.in/enneUrfw
Materia looks to make accountants more efficient with AI | TechCrunch
https://techcrunch.com
To view or add a comment, sign in
-
It's time to party!
Seattle Tech Week is Jul 29 to Aug 2 and we're hosting the party of the year! Are you an AI founder? AI investor? AI researcher? AI professor? RSVP now: https://lu.ma/ai2bbq Come hang with 700 AI researchers, professors, entrepreneurs, investors, engineers, and more! Celebrate the best of AI in the PNW with live music, startup science fair, cold beer and BBQ sliders w/ veggie options too. Musical guest: Steve Hall (https://lnkd.in/gQDCdkQA) NOTE: Registration required. Tickets will be checked at the door. Vendors/recruiters—please be respectful. This isn't the place to hustle. 😊
To view or add a comment, sign in
-
Seattle Tech Week is Jul 29 to Aug 2 and we're hosting the party of the year! Are you an AI founder? AI investor? AI researcher? AI professor? RSVP now: https://lu.ma/ai2bbq Come hang with 700 AI researchers, professors, entrepreneurs, investors, engineers, and more! Celebrate the best of AI in the PNW with live music, startup science fair, cold beer and BBQ sliders w/ veggie options too. Musical guest: Steve Hall (https://lnkd.in/gQDCdkQA) NOTE: Registration required. Tickets will be checked at the door. Vendors/recruiters—please be respectful. This isn't the place to hustle. 😊
To view or add a comment, sign in
-
We have two words... "THANK YOU!" to Gaurav Oberoi, Emad Elwany, James Baird, and Jessica Nguyen for giving us the opportunity to be part of your amazing journey. We are so grateful for the time we spent together and we are inspired everyday by your leadership, determination, entrepreneurial brilliance, and so much more. May you enjoy every bit of your smashing success. https://lnkd.in/gH59HFcP
DocuSign acquires AI-powered contract management firm Lexion | TechCrunch
https://techcrunch.com
To view or add a comment, sign in
-
This is the 10th edition of Harmonious’ weekly paper roundup series. This past week I did not find any paper that merits the spotlight designation. Thus I am experimenting with giving a brief overview of papers that are organized around the following topics. - LLM applications: agents, chatbots, RAG, document understanding, coding, and others. - LLM prompting: techniques such as CoT to help us get the most out of LLMs. - Multimodal LLMs. This is an area where many folks expect a lot of advances in the next wave. - Synthetic data and other novel ways to generate training data. Despite the rise of LLMs and zero/few-shot learning, the data bottleneck is still present. It’s helpful to find creative ways to get data, not only for fine tuning but also for prior generation ML approaches. - Benchmarks and evaluations: we need to understand the strengths and limitations of LLMs. - LLM fine tuning/many shot learning. This is an important option wherever prompting is not sufficient. - Context: topics such as context length and limits, effective use of context, context compression, etc. - LLM efficiency, primarily for fine tuning and inference. This is obviously important for real world deployment. - LLM internals: how they work. - LLM frontier: what’s the next big leap beyond transformers? State-space models? Self-evolution? - LLM announcements: e.g. Llama, Phi, etc. I created this taxonomy based on reading a few hundred papers for the first 9 editions of the weekly paper roundup series. I also roughly order the topics in the order of relevance to practitioners (obvious caveat: this is highly subjective). I may adjust this taxonomy if necessary. Not every topic will have papers for a given week. Read more about the papers from the week of April 22 here: https://lnkd.in/gVWK2kgh Sign up at https://harmonious.ai/ to get a weekly update delivered to your inbox. #harmonious.ai #ai2incubator
Weekly paper roundup (4/22/24)
harmonious.ai
To view or add a comment, sign in
2,892 followers