[ECCV 2024 - Oral] AttentionHand: Text-driven Controllable Hand Image Generation for 3D Hand Reconstruction in the Wild
AttentionHand: Text-driven Controllable Hand Image Generation for 3D Hand Reconstruction in the Wild
Junho Park*, Kyeongbo Kong* and Suk-Ju Kang†
(* Equal contribution, † Corresponding author)
- Presented by Sogang University, LG Electronics, and Pusan National University
- Primary contact: Junho Park ( [email protected] )
We propose AttentionHand, a novel method for text-driven controllable hand image generation. Our method needs easy-to-use four modalities (i.e, an RGB image, a hand mesh image from 3D label, a bounding box, and a text prompt). These modalities are embedded into the latent space by the encoding phase. Then, through the text attention stage, hand-related tokens from the given text prompt are attended to highlight hand-related regions of the latent embedding. After the highlighted embedding is fed to the visual attention stage, hand-related regions in the embedding are attended by conditioning global and local hand mesh images with the diffusion-based pipeline. In the decoding phase, the final feature is decoded to new hand images, which are well-aligned with the given hand mesh image and text prompt.
[2024/11/22] ⭐ We release train & inference code! Enjoy! 😄
[2024/08/12] 🚀 Our paper will be introduced as oral presentation at ECCV 2024!
[2024/07/03] 🔥 Our paper is accepted by ECCV 2024!
pip install -r requirements.txt
- Download our pre-trained model
attentionhand.ckpt
from here. - Set your own modalities in
samples
. (But, we provide some samples for fast implementation.) - Put samples and downloaded weight as follows.
${ROOT}
|-- samples
| |-- mesh
| | |-- ...
| |-- text
| | |-- ...
| |-- modalities.json
|-- weights
| |-- attentionhand.ckpt
- Run
inference.py
.
- Download initial model
sd15_ini.ckpt
from here. - Download pre-processed dataset
dataset.tar.gz
from here. - Put downloaded weight and dataset as follows.
${ROOT}
|-- data
| |-- mesh
| | |-- ...
| |-- rgb
| | |-- ...
| |-- text
| | |-- ...
| |-- modalities.json
|-- weights
| |-- sd15_ini.ckpt
- Run
train.py
.
- Download our pre-trained model
attentionhand.ckpt
from here. - Set your own modalities in
data
asdatasets.tar.gz
in here. - Put downloaded weight and dataset as follows.
${ROOT}
|-- data
| |-- mesh
| | |-- ...
| |-- rgb
| | |-- ...
| |-- text
| | |-- ...
| |-- modalities.json
|-- weights
| |-- attentionhand.ckpt
- Change
resume_path
intrain.py
toweights/attentionhand.ckpt
. - Run
train.py
.
Special thank to the great project: ControlNet and Attend-and-Excite!
All assets and code are under the license unless specified otherwise.
If this work is helpful for your research, please consider citing the following BibTeX entry.
@article{park2024attentionhand,
author = {Park, Junho and Kong, Kyeongbo and Kang, Suk-Ju},
title = {AttentionHand: Text-driven Controllable Hand Image Generation for 3D Hand Reconstruction in the Wild},
journal = {European Conference on Computer Vision},
year = {2024},
}