Haotian-Zhang

👋

Welcome

Haotian Zhang Haotian-Zhang

👋

Welcome

Research Scientist @ Apple. Ex-Research Intern @ MSR AI. Ph.D. candidate @ UW. Be Borderless.

369 followers · 58 following

Apple AI/ML
Cupertino, CA
haotian-zhang.github.io/
@HaotianZhang4AI

Achievements

Stars

apple / ml-slowfast-llava

SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models

Python 112 8 Updated Aug 29, 2024

baaivision / DenseFusion

DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception

Python 103 1 Updated Aug 23, 2024

IDEA-Research / T-Rex

[ECCV2024] API code for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy

Python 2,147 125 Updated Aug 29, 2024

facebookresearch / segment-anything-2

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…

Jupyter Notebook 10,680 857 Updated Aug 21, 2024

feizc / DiT-MoE

Scaling Diffusion Transformers with Mixture of Experts

Python 178 7 Updated Sep 9, 2024

mira-space / Mira

Python 332 11 Updated Sep 2, 2024

ajtejankar / mixtral-vis-moe

Visualize expert firing frequencies across sentences in the Mixtral MoE model

Python 17 2 Updated Dec 22, 2023

DAMO-NLP-SG / VideoLLaMA2

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

Python 715 48 Updated Sep 13, 2024

mlfoundations / dclm

DataComp for Language Models

HTML 1,103 96 Updated Sep 5, 2024

test-time-training / ttt-lm-pytorch

Official PyTorch implementation of Learning to (Learn at Test Time): RNNs with Expressive Hidden States

Python 970 54 Updated Jul 14, 2024

NVlabs / VILA

VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)

Python 1,781 140 Updated Sep 10, 2024

cambrian-mllm / cambrian

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

Python 1,683 110 Updated Sep 10, 2024

NVlabs / DiffiT

[ECCV 2024] Official Repository for DiffiT: Diffusion Vision Transformers for Image Generation

431 13 Updated Jul 1, 2024

wilson1yan / VideoGPT

Jupyter Notebook 962 117 Updated Apr 27, 2024

FoundationVision / vaex

🔥stable, simple, state-of-the-art VQVAE toolkit & cookbook

Python 34 Updated Jun 23, 2024

OpenRobotLab / Grounded_3D-LLM

Code&Data for Grounded 3D-LLM with Referent Tokens

Python 72 Updated Jul 1, 2024

karpathy / LLM101n

LLM101n: Let's build a Storyteller

28,230 1,538 Updated Aug 1, 2024

Open3DA / LL3DA

[CVPR 2024] "LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning"; an interactive Large Language 3D Assistant.

Python 221 9 Updated Jul 17, 2024