Stars
[ECCV 2024] official code for "Long-CLIP: Unlocking the Long-Text Capability of CLIP"
Finetune Llama 3.1, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
This repo is meant to serve as a guide for Machine Learning/AI technical interviews.
A curated list of awesome big data frameworks, ressources and other awesomeness.
A curated list of awesome System Design (A.K.A. Distributed Systems) resources.
System design patterns for machine learning
Official repo for paper "MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions"
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
[ICML'24 Oral] "MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions"
Official code for Paper "Mantis: Multi-Image Instruction Tuning"
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
Official implementation of SEED-LLaMA (ICLR 2024).
[CVPR 2024] Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
[NeurIPS 2024 D&B Track] An official implementation of ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
Code used for the creation of OBELICS, an open, massive and curated collection of interleaved image-text web documents, containing 141M documents, 115B text tokens and 353M images.
When do we not need larger vision models?
LlamaIndex is a data framework for your LLM applications
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
[ECCV2024] IDM-VTON : Improving Diffusion Models for Authentic Virtual Try-on in the Wild
Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model
Pipeline for pulling and processing online language model pretraining data from the web
Set of tools to assess and improve LLM security.