-
Apple AI/ML
- Cupertino, CA
- haotian-zhang.github.io/
- @HaotianZhang4AI
Stars
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
[ECCV2024] API code for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
Scaling Diffusion Transformers with Mixture of Experts
Visualize expert firing frequencies across sentences in the Mixtral MoE model
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
Official PyTorch implementation of Learning to (Learn at Test Time): RNNs with Expressive Hidden States
VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
[ECCV 2024] Official Repository for DiffiT: Diffusion Vision Transformers for Image Generation
🔥stable, simple, state-of-the-art VQVAE toolkit & cookbook
Code&Data for Grounded 3D-LLM with Referent Tokens
[CVPR 2024] "LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning"; an interactive Large Language 3D Assistant.
Open-source evaluation toolkit of large vision-language models (LVLMs), support ~100 VLMs, 40 benchmarks
Code for 3D-LLM: Injecting the 3D World into Large Language Models
MiniCPM3-4B: An edge-side LLM that surpasses GPT-3.5-Turbo.
Implementation of Infini-Transformer in Pytorch
Lumina-T2X is a unified framework for Text to Any Modality Generation
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
Vector (and Scalar) Quantization, in Pytorch
[CVPR 2024] 🎬💭 chat with over 10K frames of video!
Official repo for paper "MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions"