- Shanghai, China.
- wzmsltw.github.io
Stars
Layout-Guided multi-view driving scene video generation with latent diffusion model
[ICRA 2024] RenderOcc: Vision-Centric 3D Occupancy Prediction with 2D Rendering Supervision. (Former version: UniOcc)
[CVPR 2024] Symphonies (Scene-from-Insts): Symphonize 3D Semantic Scene Completion with Contextual Instance Queries
Official JAX implementation of MAGVIT: Masked Generative Video Transformer
[BMVC 2024] Official implementation of Align-DETR
[IEEE T-PAMI 2024] All you need for End-to-end Autonomous Driving
Official code base of the BEVDet series .
Vision-Centric BEV Perception: A Survey
[CVPR'22 Oral] GMFlow: Learning Optical Flow via Global Matching
Offical PyTorch implementation of "BEVFusion: A Simple and Robust LiDAR-Camera Fusion Framework"
MulimgViewer is a multi-image viewer that can open multiple images in one interface, which is convenient for image comparison and image stitching.
Official Jax Implementation of MaskGIT
[CVPR'22] ICON: Implicit Clothed humans Obtained from Normals
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
[SIGGRAPH'22] StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets
Official Pytorch Implementation for "Splicing ViT Features for Semantic Appearance Transfer" presenting "Splice" (CVPR 2022 Oral)
Code for the ECCV 2022 paper "Unleashing Transformers"
A unified 3D Transformer Pipeline for visual synthesis
GLIDE: a diffusion-based text-conditional image synthesis model
Useful resources for creating Design Artificial Intelligence
Very Long Natural Scenery Image Prediction by Outpainting, ICCV2019, TensorFlow
Feed forward VQGAN-CLIP model, where the goal is to eliminate the need for optimizing the latent space of VQGAN for each input prompt
A CLI tool/python module for generating images from text using guided diffusion and CLIP from OpenAI.
AI-powered Text-to-Art Generator - Text2Art.com