In the 52nd session of #MultimodalWeekly, we have three exciting researchers working in Human-Computer Interaction for video understanding, large-scale multimodal models, and video question answering. ✅ Saelyne Yang, Ph.D. Candidate at KAIST, will present her work on enhancing how people learn procedural tasks through how-to videos. ✅ Bo Li and Yuanhan Zhang, Ph.D. students at Nanyang Technological University Singapore, will introduce recent works at LMMs-Lab, including LLaVA-NeXT, LongVA, and LMMs-Eveal. ✅ 肖俊斌, a Research fellow at National University of Singapore, will present his work on visually-grounded video question-answering. Register for the webinar here: https://lnkd.in/gJGtscSH 👈 Join our Discord: https://lnkd.in/gRt4GdDx 🤝
Twelve Labs
Software Development
San Francisco, California 6,110 followers
Help developers build programs that can see, listen, and understand the world as we do.
About us
Helping developers build programs that can see, hear, and understand the world as we do by giving them the world's most powerful video-understanding infrastructure.
- Website
-
http://www.twelvelabs.io
External link for Twelve Labs
- Industry
- Software Development
- Company size
- 11-50 employees
- Headquarters
- San Francisco, California
- Type
- Privately Held
- Founded
- 2021
Locations
-
Primary
555 Mission St
San Francisco, California 94105, US
Employees at Twelve Labs
Updates
-
Twelve Labs is excited to be joining the 2024 SVG Sports Content Management Forum in NYC on July 24! 🎙️ Panel Highlight: Our own Founding Solutions Architect Travis C. will be sharing insights on the 'AI Today and Tomorrow' panel, exploring how AI is reshaping sports media management with our trusted partners: Dustin Myers from FOX Sports, Byron Chapman from PGA TOUR, Jean-Christophe Curelop from Perfect Memory, and Tab B. 🤝 We’d love to connect and innovate together! Join us to learn how Twelve Labs is partnering and integrating with top sports networks, M&E companies, and MAMs to to unlock new possibilities in content creation, management, and monetization. Learn how you can streamline your video workflows and add value to your content with us. See you there! #SVGSCMForum #SportsMedia #AIinSports Soyoung Lee Travis C. Maninder Saini Andy Vaughan
-
-
In the 51st session of #MultimodalWeekly, we have three exciting presentations from startup founder and researchers working in Multimodal AI. ✅ Jay Chia, the co-founder of Eventual, will share the DIY multimodal data lake with Daft data frames. -> Check out daft: https://www.getdaft.io/ ✅ Saptarshi Sinha, a Ph.D. researcher at the University of Bristol, will present his work "Every Shot Counts: Using Exemplars for Repetition Counting in Videos." -> Read the paper: https://lnkd.in/gUdY2NCh ✅ Yunhua Zhang, a Ph.D. candidate at UvA, will present her work "Low-Resource Vision Challenges for Foundation Models." -> Read the paper: https://lnkd.in/gXSe_ed5 Register for the webinar here: https://lnkd.in/gJGtscSH 👈 Join our Discord to connect with the speakers: https://lnkd.in/gRt4GdDx 🤝
-
-
~ New Webinar ~ The recording of #MultimodalWeekly 49 with Jiwoo Hong from KAIST AI and Associate Professor Lei Huang and Baichuan Zhou from Beihang University is up! 📺 Watch here: https://lnkd.in/gjtn4ZQX They discussed: - Motivation for ORPO: RLHF with PPO, DPO, and SFT in alignment - Experimental results of ORPO in single-turn and multi-turn instruction following - Efficiency and scalability of ORPO - The opportunity for small-scale LMM - How to merge modality (vision) in small LMM? - TinyLLaVA: From the model, data, and training perspectives Join our Discord community: discord.gg/Sh6BRfakJa 🤝
Single-Step Language Model Alignment & Smaller-Scale Large Multimodal Models | Multimodal Weekly 49
https://www.youtube.com/
-
~ New Webinar ~ The recording of #MultimodalWeekly 48 with Letian (Max) Fu from University of California, Berkeley and Bo Zhao from Beijing Academy of Artificial Intelligence(BAAI) is up! 📺 Watch here: https://lnkd.in/gX_iSzuD They discussed: - Touch as a sensing modality is missing in multimodal models - Touch-vision-language dataset - TVL-Tactile Encoder & TVL-LLaMA - SVIT: Scaling Up Visual Instruction Tuning - Bunny: A concise open-source lightweight multimodal LLM - M3D: Advancing 3D medical image analysis with multimodal LLMs - MLVU: A comprehensive benchmark for multi-task long video understanding Join our Discord community: discord.gg/Sh6BRfakJa 🤝
Modality Alignment for Multimodal Perception & Open-Source Lightweight MLLM | Multimodal Weekly 48
https://www.youtube.com/
-
In the 50th session of #MultimodalWeekly, we have two exciting presentations from startup founders building real-world products for Multimodal AI applications. ✅ Jesse N. Clark, the Co-Founder and CTO of Marqo AI, will discuss generalized contrastive learning for multimodal retrieval and ranking. They generalize the popular CLIP training method to accommodate any number of text and images when representing documents and encode relevance (or rank) to provide better first-stage retrieval. 📄 ✅ Alexandre Berkovic, the Co-Founder and CEO of Adorno AI, will dive into how video and audio understanding technologies from Twelve Labs and Adorno AI are transforming video production. 📻 Register for the webinar here: https://lnkd.in/gJGtscSH 👈 Join our Discord community: https://lnkd.in/gRt4GdDx
-
-
Twelve Labs will be attending AWS Summit NY on July 10! Connect with our team to learn how you can streamline all your video-related workflows with our multimodal AI models and discuss the latest in tech. Don’t hesitate to say hello when you spot any of our team members Jae Lee Soyoung Lee Maninder Saini Andy Vaughan. We can’t wait to see everyone there! #AWSSummit #AWSNY
-
We got a new exciting collaboration with the Phyllo team to transform video insights on social media 😉 🌟 Why This Matters 🌟 With social media shifting to video, extracting insights is crucial. Video posts get up to 10 times more engagement and 74% of users take action after viewing a brand's video. 🔍The Phyllo and Twelve Labs Advantage🔍 Phyllo: - Customizable searches across 15 social media platforms. - Cost-effective social data access. Twelve Labs: - Foundation models that analyze videos through visual, audio, and text modalities. - Offers semantic video search, zero-shot classification, video-to-text generation, and multimodal video embeddings. 🌐 Innovative Use Cases 🌐 1 - Insights for Videos: Get detailed answers, summaries, and sentiment analysis. 2 - Product Development: Analyze product usage in social videos. 3 - Byte-Sized Segments: Break long videos into short clips for Instagram and TikTok. 4 - Influencer Insights: Identify influencers using specific products and their impact. Read more about our collaboration here: https://lnkd.in/gC9Zjmgp 👀
-
-
~ New Webinar ~ The video recording of #MultimodalWeekly 47 with Benjamin Muller, Tu Anh NGUYEN, and Bokai Yu from AI at Meta is up! 📺 Watch here: https://lnkd.in/guZ5C_mU 👀 They discussed: - Challenges of expressive speech generation - SpiRit-LM combines TextLM and SpeechLM - SpiRit-LM training recipe and generation samples - Evaluation: zero-shot, few-shot, and text-speech sentiment-preservation benchmark - Can we observe the speech-text alignment? Join our Discord community: discord.gg/Sh6BRfakJa 🤝
SpiRit-LM, an Interleaved Spoken and Written Language Model | Multimodal Weekly 47
https://www.youtube.com/
-
🏇 We are excited to announce the launch of Jockey: A Conversational Video Agent powered by Twelve Labs APIs and LangGraph from LangChain! Here's why developers should dive into Jockey: 👇 1 - Advanced Video Understanding: Jockey utilizes Twelve Labs' state-of-the-art video foundation models to extract rich insights from video content, offering capabilities like video search, classification, summarization, and more. 📽 2 - Flexible and Scalable Framework: Built on LangGraph, Jockey provides unparalleled control over the flow of code, prompts, and LLM calls, facilitating robust human-agent collaboration and ensuring reliable performance. ⛓ 3 - Efficient and Precise Architecture: Jockey's architecture includes key components such as the Supervisor, the Planner, and specialized Workers that handle tasks like video search, text generation, and editing, ensuring optimal token usage and accurate node responses. 🏛 4 - Customizable and Extensible: Jockey's modular design allows for easy customization and extension. Developers can modify prompts, extend state management, or add new workers to tailor Jockey to specific needs, making it a versatile foundation for advanced video AI applications. 🤟 Full blog post here: https://lnkd.in/gbudqhKM 😎
-