-
Institute of Automation,Chinese Academy of Sciences
- 中国
Stars
The Dawn of Video Generation: Preliminary Explorations with SORA-like Models
Collect some World Models for Autonomous Driving papers.
Code for the benchmark - DeepAccident: A Motion and Accident Prediction Benchmark for V2X Autonomous Driving.
[CVPR 2024 - Oral, Best Paper Award Candidate] Marigold: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation
This repository is the official implementation of Human4DiT: 360-degree Human Video Generation with 4D Diffusion Transformer.
Implementation of Muse: Text-to-Image Generation via Masked Generative Transformers, in Pytorch
[ECCV 2024] 3D World Model for Autonomous Driving
This repo aims to customize moving trajectories in a video.
[ICCV 2023] StreamPETR: Exploring Object-Centric Temporal Modeling for Efficient Multi-View 3D Object Detection
You can easily calculate FVD, PSNR, SSIM, LPIPS for evaluating the quality of generated or predicted videos.
📹 A more flexible CogVideoX that can generate videos at any resolution and creates videos from images.
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
To support and further the research in the field of portrait animation , we are excited to launch PhotoPoster, an open project for pose-driven image generation.
[CVPR 2024] | LAMP: Learn a Motion Pattern for Few-Shot Based Video Generation
Official Implementation of "Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining"
DriveArena: A Closed-loop Generative Simulation Platform for Autonomous Driving
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
Understand Human Behavior to Align True Needs
[CVPR 2024 Highlight] Style Injection in Diffusion: A Training-free Approach for Adapting Large-scale Diffusion Models for Style Transfer
Implemented BEVFormer support for BEV segmentation
Iterable datapipelines for pytorch training.
The devkit of the nuScenes dataset.
Stable Video Diffusion Training Code and Extensions.
[CVPR 2024] FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation