Stars
"LightRAG: Simple and Fast Retrieval-Augmented Generation"
SWE-agent takes a GitHub issue and tries to automatically fix it, using GPT-4, or your LM of choice. It can also be employed for offensive cybersecurity or competitive coding challenges. [NeurIPS 2…
[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
SearXNG is a free internet metasearch engine which aggregates results from various search services and databases. Users are neither tracked nor profiled.
Code for LifelongMemory: Leveraging LLMs for Answering Queries in Long-form Egocentric Videos
Educational framework exploring ergonomic, lightweight multi-agent orchestration. Managed by OpenAI Solution team.
An AI-powered search engine with a generative UI
Writing AI Conference Papers: A Handbook for Beginners
This is the repository for the Tool Learning survey.
[ECCV2024] Learning Video Context as Interleaved Multimodal Sequences
🔥🕷️ Crawl4AI: Open-source LLM Friendly Web Crawler & Scrapper
Omni6D: Large-Vocabulary 3D Object Dataset for Category-Level 6D Object Pose Estimation
[CVPR 2024] The code for paper 'Towards Learning a Generalist Model for Embodied Navigation'
Grounded search engine (i.e. with source reference) based on LLM / ChatGPT / OpenAI API. It supports web search, file content search etc.
An open-source RAG-based tool for chatting with your documents.
(2024CVPR) MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
[CVPR 2024] MemFlow: Optical Flow Estimation and Prediction with Memory
Playwright is a framework for Web Testing and Automation. It allows testing Chromium, Firefox and WebKit with a single API.
The First Multimodal Seach Engine Pipeline and Benchmark for LMMs
Recent advancements propelled by large language models (LLMs), encompassing an array of domains including Vision, Audio, Agent, Robotics, Fundamental Sciences such as Mathematics, and Ominous.
MLLM for On-Demand Spatial-Temporal Understanding at Arbitrary Resolution
Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models
Official Implementation of "Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining"
ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Experts via Clustering
InsTag: A Tool for Data Analysis in LLM Supervised Fine-tuning
Vector Neurons: A General Framework for SO(3)-Equivariant Networks
GRUtopia: Dream General Robots in a City at Scale