Skip to content

Official Pytorch implementation for "AttentionHand: Text-driven Controllable Hand Image Generation for 3D Hand Reconstruction in the Wild" (ECCV 2024 Oral)

License

Notifications You must be signed in to change notification settings

redorangeyellowy/AttentionHand

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

[ECCV 2024 - Oral] AttentionHand: Text-driven Controllable Hand Image Generation for 3D Hand Reconstruction in the Wild

arXiv Project Page Youtube

AttentionHand: Text-driven Controllable Hand Image Generation for 3D Hand Reconstruction in the Wild

Junho Park*, Kyeongbo Kong* and Suk-Ju Kang†

(* Equal contribution, † Corresponding author)

TL;DR

We propose AttentionHand, a novel method for text-driven controllable hand image generation. Our method needs easy-to-use four modalities (i.e, an RGB image, a hand mesh image from 3D label, a bounding box, and a text prompt). These modalities are embedded into the latent space by the encoding phase. Then, through the text attention stage, hand-related tokens from the given text prompt are attended to highlight hand-related regions of the latent embedding. After the highlighted embedding is fed to the visual attention stage, hand-related regions in the embedding are attended by conditioning global and local hand mesh images with the diffusion-based pipeline. In the decoding phase, the final feature is decoded to new hand images, which are well-aligned with the given hand mesh image and text prompt.

introduction

What's New

[2024/11/22] ⭐ We release train & inference code! Enjoy! 😄

[2024/08/12] 🚀 Our paper will be introduced as oral presentation at ECCV 2024!

[2024/07/03] 🔥 Our paper is accepted by ECCV 2024!

Install

pip install -r requirements.txt

Inference

  1. Download our pre-trained model attentionhand.ckpt from here.
  2. Set your own modalities in samples. (But, we provide some samples for fast implementation.)
  3. Put samples and downloaded weight as follows.
${ROOT}
|-- samples
|   |-- mesh
|   |   |-- ...
|   |-- text
|   |   |-- ...
|   |-- modalities.json
|-- weights
|   |-- attentionhand.ckpt
  1. Run inference.py.

Train from scratch

  1. Download initial model sd15_ini.ckpt from here.
  2. Download pre-processed dataset dataset.tar.gz from here.
  3. Put downloaded weight and dataset as follows.
${ROOT}
|-- data
|   |-- mesh
|   |   |-- ...
|   |-- rgb
|   |   |-- ...
|   |-- text
|   |   |-- ...
|   |-- modalities.json
|-- weights
|   |-- sd15_ini.ckpt
  1. Run train.py.

Fine-tuning

  1. Download our pre-trained model attentionhand.ckpt from here.
  2. Set your own modalities in data as datasets.tar.gz in here.
  3. Put downloaded weight and dataset as follows.
${ROOT}
|-- data
|   |-- mesh
|   |   |-- ...
|   |-- rgb
|   |   |-- ...
|   |-- text
|   |   |-- ...
|   |-- modalities.json
|-- weights
|   |-- attentionhand.ckpt
  1. Change resume_path in train.py to weights/attentionhand.ckpt.
  2. Run train.py.

Related Repositories

Special thank to the great project: ControlNet and Attend-and-Excite!

License and Citation

All assets and code are under the license unless specified otherwise.

If this work is helpful for your research, please consider citing the following BibTeX entry.

@article{park2024attentionhand,
  author  = {Park, Junho and Kong, Kyeongbo and Kang, Suk-Ju},
  title   = {AttentionHand: Text-driven Controllable Hand Image Generation for 3D Hand Reconstruction in the Wild},
  journal = {European Conference on Computer Vision},
  year    = {2024},
}

About

Official Pytorch implementation for "AttentionHand: Text-driven Controllable Hand Image Generation for 3D Hand Reconstruction in the Wild" (ECCV 2024 Oral)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages