Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think
Abstract
Recent work showed that large diffusion models can be reused as highly precise monocular depth estimators by casting depth estimation as an image-conditional image generation task. While the proposed model achieved state-of-the-art results, high computational demands due to multi-step inference limited its use in many scenarios. In this paper, we show that the perceived inefficiency was caused by a flaw in the inference pipeline that has so far gone unnoticed. The fixed model performs comparably to the best previously reported configuration while being more than 200times faster. To optimize for downstream task performance, we perform end-to-end fine-tuning on top of the single-step model with task-specific losses and get a deterministic model that outperforms all other diffusion-based depth and normal estimation models on common zero-shot benchmarks. We surprisingly find that this fine-tuning protocol also works directly on Stable Diffusion and achieves comparable performance to current state-of-the-art diffusion-based depth and normal estimation models, calling into question some of the conclusions drawn from prior works.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- BetterDepth: Plug-and-Play Diffusion Refiner for Zero-Shot Monocular Depth Estimation (2024)
- PrimeDepth: Efficient Monocular Depth Estimation with a Stable Diffusion Preimage (2024)
- Diffusion Models for Monocular Depth Estimation: Overcoming Challenging Conditions (2024)
- GRIN: Zero-Shot Metric Depth with Pixel-Level Diffusion (2024)
- EDADepth: Enhanced Data Augmentation for Monocular Depth Estimation (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper