Stars
g1: Using Llama-3.1 70b on Groq to create o1-like reasoning chains
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
LPIPS metric. pip install lpips
T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation
A repository of models, textual inversions, and more
Evaluating text-to-image/video/3D models with VQAScore
Recipes to train reward model for RLHF.
Accessible large language models via k-bit quantization for PyTorch.
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis
A framework for prompt tuning using Intent-based Prompt Calibration
Official inference repo for FLUX.1 models
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
Agentic components of the Llama Stack APIs
Efficiently Fine-Tune 100 LLMs in WebUI (ACL 2024)
A Comprehensive Toolkit for High-Quality PDF Content Extraction
Scalable toolkit for efficient model alignment
Official Pytorch Implementation for "TokenFlow: Consistent Diffusion Features for Consistent Video Editing" presenting "TokenFlow" (ICLR 2024)
Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Disaggregated serving system for Large Language Models (LLMs).
A fast communication-overlapping library for tensor parallelism on GPUs.
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
Span-based Localizing Network for Natural Language Video Localization (ACL 2020)