-
University of Science and Technology of China (USTC)
- Hefei, China
-
16:29
(UTC 08:00) - https://zhendongwang6.github.io/
- https://scholar.google.com.hk/citations?user=Ya5VDjQAAAAJ&hl=zh-CN
Highlights
- Pro
Lists (24)
Sort Name ascending (A-Z)
chatgpt
clip
controlnet
dataset
diffusion model
face-anti-spoofing
face-forgery-detection
flow
gan
img2img
knowledge distillation
large language models
large vision model
ocr
pretrain
sam系列
score metrics
segmentation
subject driven generation
survey
tools
vae
vision_language
visual text generation
Stars
[NeurIPS 2024] Visual Perception by Large Language Model’s Weights
🔥ImageFolder: Autoregressive Image Generation with Folded Tokens
CAR: Controllable AutoRegressive Modeling for Visual Generation
Conceptual Captions is a dataset containing (image-URL, caption) pairs designed for the training and evaluation of machine learned image captioning systems.
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
code for "Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion"
(ෆ`꒳´ෆ) A Survey on Text-to-Image Generation/Synthesis.
[ECCV 2024 Oral] Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models
Official implementation of "EG4D: Explicit Generation of 4D Object without Score Distillation"
Lumina-T2X is a unified framework for Text to Any Modality Generation
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
[CVPR 2024] Code release for "InstanceDiffusion: Instance-level Control for Image Generation"
[ICML 2024] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (RPG)
Code for "Diffusion Model Alignment Using Direct Preference Optimization"
[NeurIPS 2024 Oral][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-sim…
Latte: Latent Diffusion Transformer for Video Generation.
VideoSys: An easy and efficient system for video generation
[NeurIPS 2024] GaussianCube: A Structured and Explicit Radiance Representation for 3D Generative Modeling
One-step image-to-image with Stable Diffusion turbo: sketch2image, day2night, and more
Official Implementation of Rectified Flow (ICLR2023 Spotlight)
a state-of-the-art-level open visual language model | 多模态预训练模型
[WIP] Layer Diffusion for WebUI (via Forge)
A collection of resources on controllable generation with text-to-image diffusion models.
[ICML 2024 Spotlight] FiT: Flexible Vision Transformer for Diffusion Model