This is the 10th edition of Harmonious’ weekly paper roundup series. This past week I did not find any paper that merits the spotlight designation. Thus I am experimenting with giving a brief overview of papers that are organized around the following topics. - LLM applications: agents, chatbots, RAG, document understanding, coding, and others. - LLM prompting: techniques such as CoT to help us get the most out of LLMs. - Multimodal LLMs. This is an area where many folks expect a lot of advances in the next wave. - Synthetic data and other novel ways to generate training data. Despite the rise of LLMs and zero/few-shot learning, the data bottleneck is still present. It’s helpful to find creative ways to get data, not only for fine tuning but also for prior generation ML approaches. - Benchmarks and evaluations: we need to understand the strengths and limitations of LLMs. - LLM fine tuning/many shot learning. This is an important option wherever prompting is not sufficient. - Context: topics such as context length and limits, effective use of context, context compression, etc. - LLM efficiency, primarily for fine tuning and inference. This is obviously important for real world deployment. - LLM internals: how they work. - LLM frontier: what’s the next big leap beyond transformers? State-space models? Self-evolution? - LLM announcements: e.g. Llama, Phi, etc. I created this taxonomy based on reading a few hundred papers for the first 9 editions of the weekly paper roundup series. I also roughly order the topics in the order of relevance to practitioners (obvious caveat: this is highly subjective). I may adjust this taxonomy if necessary. Not every topic will have papers for a given week. Read more about the papers from the week of April 22 here: https://lnkd.in/gVWK2kgh Sign up at https://harmonious.ai/ to get a weekly update delivered to your inbox. #harmonious.ai #ai2incubator
AI2 Incubator’s Post
More Relevant Posts
-
On Thursday, Materia AI emerged from stealth to create an invaluable AI assistant and workspace for accounting firms. As the US continues to grapple with its growing accountant shortage, providing teams with the tools they need to augment their workflows will be critical to relieving the pressures on the industry. You can check out the product at https://www.trymateria.ai/ Congratulations Kevin Merlini and Lucas Adams - we can’t wait to see what the future holds! TechCrunch article here: https://lnkd.in/enneUrfw
Materia looks to make accountants more efficient with AI | TechCrunch
https://techcrunch.com
To view or add a comment, sign in
-
It's time to party!
Seattle Tech Week is Jul 29 to Aug 2 and we're hosting the party of the year! Are you an AI founder? AI investor? AI researcher? AI professor? RSVP now: https://lu.ma/ai2bbq Come hang with 700 AI researchers, professors, entrepreneurs, investors, engineers, and more! Celebrate the best of AI in the PNW with live music, startup science fair, cold beer and BBQ sliders w/ veggie options too. Musical guest: Steve Hall (https://lnkd.in/gQDCdkQA) NOTE: Registration required. Tickets will be checked at the door. Vendors/recruiters—please be respectful. This isn't the place to hustle. 😊
To view or add a comment, sign in
-
Seattle Tech Week is Jul 29 to Aug 2 and we're hosting the party of the year! Are you an AI founder? AI investor? AI researcher? AI professor? RSVP now: https://lu.ma/ai2bbq Come hang with 700 AI researchers, professors, entrepreneurs, investors, engineers, and more! Celebrate the best of AI in the PNW with live music, startup science fair, cold beer and BBQ sliders w/ veggie options too. Musical guest: Steve Hall (https://lnkd.in/gQDCdkQA) NOTE: Registration required. Tickets will be checked at the door. Vendors/recruiters—please be respectful. This isn't the place to hustle. 😊
To view or add a comment, sign in
-
We have two words... "THANK YOU!" to Gaurav Oberoi, Emad Elwany, James Baird, and Jessica Nguyen for giving us the opportunity to be part of your amazing journey. We are so grateful for the time we spent together and we are inspired everyday by your leadership, determination, entrepreneurial brilliance, and so much more. May you enjoy every bit of your smashing success. https://lnkd.in/gH59HFcP
DocuSign acquires AI-powered contract management firm Lexion | TechCrunch
https://techcrunch.com
To view or add a comment, sign in
-
For the second week in a row, Harmonious' spotlight paper is about a new benchmark: BLINK: Multimodal Large Language Models Can See but Not Perceive. Its authors are with UPenn, U Washington, AI2, UC Davis, and Columbia U. BLINK is a benchmark containing 14 visual perception tasks that can be solved by humans “within a blink”, but pose significant challenges for current multimodal LLMs since they resist mediation through natural language (i.e. dense captioning). While humans get 96% accuracy, the best-performing GPT-4V, Gemini Pro, and Claude Opus achieve accuracies of 51%, 45%, and 43% respectively, not much better than random guessing (38%). This indicates that such perception abilities have not “emerged” yet in recent multimodal LLMs. Notably, for certain tasks some multimodal LLMs even underperform compared to random guessing. Specialist CV models could solve these problems much better. Read our analysis on Harmonious at https://lnkd.in/gTMwH72C where we discuss related topics such as the Moravec paradox, System 1 and 2, and the path to AGI in addition to recommendations for practitioners. Sign up at Harmonious.ai to never miss our weekly paper roundup! #harmonious #ai2incubator
Weekly paper roundup: BLINK: multimodal LLMs can see but not perceive (4/15/2024)
harmonious.ai
To view or add a comment, sign in
-
Harmonious's spotlight paper for the week of April 8 is RULER: What’s the Real Context Size of Your Long-Context Language Models? Authors: NVIDIA In LLM world, longer context, as indicated in the spec, does not necessarily mean better. This paper introduces a new benchmark, called RULER, to test LLMs’ abilities to handle long context. This benchmark is more complex than the popular retrieval-focused needle-in-the-haystack benchmark (NIAH), testing the abilities for co-reference resolution and aggregation. The authors evaluate GPT-4 and nine open source LLMs on this benchmark. While all ten models accept 32K context, only four models: GPT-4, Command-R, Yi-34B, and Mixtral 8x7B, have OK performance at this length. GPT-4 unsurprisingly emerges as the winner. Read our analysis of this paper and other noteworthy papers at https://lnkd.in/grG2nBhN Update 5/2: The authors evaluated Gemini-1.5-Pro which now takes over the 1st ranking from GPT-4. See the up-to-date results at: #ai2incubator #harmonious
Weekly paper roundup: RULER: real context size of LLMs (4/8/2024)
harmonious.ai
To view or add a comment, sign in
-
Insight 14 is here. We announce Harmonious.ai, an online paper reading and discussion forum. Harmonious was born at the AI2 Incubator as part of our efforts to keep track of advances in AI to advise founders. We decided to open (source) it to encourage wider participation from all AI builders. We also share some observations and learned lessons working alongside founders building pre-seed AI startups for the last seven years as we wrap up the first quarter of 2024. Topics include how to pick the right idea, assemble a strong founding team, harness improving AI capabilities, secure compute resources, and navigate a tough fundraising environment. ICYMI, last month we shared the news about AI2 Incubator’s $200M compute offering to startups. https://lnkd.in/gMsZuqzx
Insight 14: Navigating Up the Slope of Enlightenment
ai2incubator.com
To view or add a comment, sign in
2,615 followers