Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Relation DETR #34900

Open
wants to merge 79 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
79 commits
Select commit Hold shift click to select a range
61d6017
Add RelationDETR
xiuqhou Nov 21, 2024
843d05b
Align Relation-DETR with official implementation
xiuqhou Nov 21, 2024
546d038
Add ImageProcessor for Relation-DETR
xiuqhou Nov 21, 2024
b7fab06
Fix forward errors in Relation-DETR
xiuqhou Nov 22, 2024
f5aa391
Align RelationDetrLoss with official implementation
xiuqhou Nov 22, 2024
892b9a0
Fix tuple sequence in Relation-DETR
xiuqhou Nov 22, 2024
5c730dc
Fix data types in Relation-DETR
xiuqhou Nov 24, 2024
67ac5ab
Rename def_relation_detr to relation_detr
xiuqhou Nov 24, 2024
7c09309
Add image_processing_relation_detr tester
xiuqhou Nov 24, 2024
e47b9a7
Make sure return equivalence of RelationDETR
xiuqhou Nov 24, 2024
c90eae5
Fix types of auxiliary_outputs or relation_detr
xiuqhou Nov 24, 2024
fd301e4
Fix logits and pred_boxes for dn_out
xiuqhou Nov 24, 2024
8f37eb8
Fix out of range of selection in relation-detr
xiuqhou Nov 24, 2024
06be427
Fix initialization in RelationDETR
xiuqhou Nov 24, 2024
cfa7f4d
Integrate RelationDETR with transformers
xiuqhou Nov 24, 2024
795b5f8
Refactor image processor of relation-detr
xiuqhou Nov 24, 2024
2fb3052
Remove _tied_weights_keys in RelationDETR
xiuqhou Nov 24, 2024
3e18eca
Add test_modeling relation_detr
xiuqhou Nov 24, 2024
b7a8e47
Correct code style
xiuqhou Nov 24, 2024
b11451a
Style of code to pass fix-copies
xiuqhou Nov 24, 2024
17771fe
Doc update for RelationDetrConfig
xiuqhou Nov 24, 2024
2ec689a
Code quality
xiuqhou Nov 24, 2024
424c6d6
Remove trailing whitespace
xiuqhou Nov 24, 2024
0b8b7c2
Remove unused `max_position_embeddings`
xiuqhou Nov 24, 2024
d90315f
Refactor config attribute
xiuqhou Nov 25, 2024
b0763b4
Add model_doc for relation_detr
xiuqhou Nov 25, 2024
a3fc1c1
Update integration tests for relation-detr
xiuqhou Nov 25, 2024
003a9de
Fix level_embed initialization of relation-detr
xiuqhou Nov 25, 2024
78fe8dc
Code style of relation-detr
xiuqhou Nov 25, 2024
ee8c0c1
Add RelationDetrImageProcessorFast
xiuqhou Nov 26, 2024
d0ac9b0
Update import structure for relation-detr
xiuqhou Nov 26, 2024
f40d30b
Fix __init__ for image_processor_fast
xiuqhou Nov 26, 2024
ab19eec
Add format postprocess of relation-detr for backbone
xiuqhou Nov 26, 2024
8c9e11e
Add post-process for RelationDetr backbone
xiuqhou Nov 27, 2024
abe0978
Fix contributor in model_doc of RelationDETR
xiuqhou Nov 28, 2024
7990348
Add Copyright header in loss_relation_detr
xiuqhou Nov 28, 2024
1a97869
Remove **kwargs and masks in RelationDetrHungarianMatcher
xiuqhou Nov 28, 2024
75b3db8
Refactor RelationDetrLoss params to RelationDetrConfig
xiuqhou Nov 28, 2024
8fd6b73
Update docstring format in loss of RelationDETR
xiuqhou Nov 28, 2024
e2904a6
Update src/transformers/models/auto/configuration_auto.py
xiuqhou Nov 28, 2024
9d890e4
Update Relation DETR to RelationDETR in doc table
xiuqhou Nov 28, 2024
7295e12
Update src/transformers/models/relation_detr/image_processing_relatio…
xiuqhou Nov 28, 2024
7d6c28f
Remove code about backward compatability in RelationDetrConfig
xiuqhou Nov 28, 2024
2e2c611
Update comment in RelationDetrImageProcessor
xiuqhou Nov 28, 2024
d4eeebd
Fix error caused by backbone_config dict
xiuqhou Nov 28, 2024
971c6cb
Fix Copyright header in RelationDETR
xiuqhou Nov 28, 2024
1041dbf
Remove backward compatability and unused params in image_processing o…
xiuqhou Nov 28, 2024
7bddbc0
Add copied from comment for RelationDETR
xiuqhou Dec 3, 2024
c18a3b1
Refactor docstring and param for RelationDetr
xiuqhou Dec 3, 2024
a75fd73
Refactor RelationDetrConvEncoderPostLayerNorm
xiuqhou Dec 3, 2024
1ab7eff
Add `Copied from` and `Modified from` for RelationDETR
xiuqhou Dec 3, 2024
e4fe1b4
Fix threshold filter in post process of RelationDetr
xiuqhou Dec 3, 2024
d2a867a
Update test case of RelationDETR
xiuqhou Dec 3, 2024
da40fa6
Use config as input param of RelationDetrSinePositionEmbedding
xiuqhou Dec 3, 2024
5727c38
Remove position_embedding_type in RelationDetrConfig
xiuqhou Dec 3, 2024
b68ac3d
Remove back compat in backbone of RelationDetr
xiuqhou Dec 3, 2024
266914a
Update test config of RelationDetr
xiuqhou Dec 3, 2024
ab7c1d4
Add more type hint and docstring for RelationDetr
xiuqhou Dec 3, 2024
920401e
Add convert script for resnet and swin backbone
xiuqhou Dec 4, 2024
29a7a02
Add convert script for focalnet backbone
xiuqhou Dec 4, 2024
979de68
Use print rather than logger for stdout
xiuqhou Dec 4, 2024
99c647b
Remove no_grad context for replace_batch_norm
xiuqhou Dec 26, 2024
30ca937
Update param comment with more descriptive names
xiuqhou Dec 26, 2024
88849ba
Update param comment
xiuqhou Dec 26, 2024
057c4a3
Transpose features based on backbone_features_format
xiuqhou Dec 26, 2024
5efc1e1
Update src/transformers/models/relation_detr/modeling_relation_detr.py
xiuqhou Dec 26, 2024
c62ad9a
Update param and docstring for `backbone_features_format`
xiuqhou Dec 26, 2024
550def3
Remove `with_pos_embed`
xiuqhou Dec 26, 2024
2617796
Use backbone_features_format in RelationDetrConvEncoder
xiuqhou Dec 26, 2024
d0cae13
Add layer_norm_eps param
xiuqhou Dec 26, 2024
9ac4530
Update src/transformers/models/relation_detr/modeling_relation_detr.py
xiuqhou Dec 26, 2024
91f8896
Update src/transformers/models/relation_detr/modeling_relation_detr.py
xiuqhou Dec 26, 2024
4f66c7d
Remove empty line
xiuqhou Dec 26, 2024
5bce58d
Update examples
xiuqhou Dec 26, 2024
b5054ee
Remove masks in relation_detr tester
xiuqhou Dec 26, 2024
724b366
Rename embed_dim to d_model for consistency
xiuqhou Dec 26, 2024
7d965b2
Add more type hints
xiuqhou Dec 26, 2024
48fc2a9
Update docstring
xiuqhou Dec 26, 2024
f530bbb
Add backbone_features_format check
xiuqhou Dec 26, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/source/en/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -289,6 289,7 @@ Flax), PyTorch, and/or TensorFlow.
| [RecurrentGemma](model_doc/recurrent_gemma) | ✅ | ❌ | ❌ |
| [Reformer](model_doc/reformer) | ✅ | ❌ | ❌ |
| [RegNet](model_doc/regnet) | ✅ | ✅ | ✅ |
| [RelationDETR](model_doc/relation_detr) | ✅ | ❌ | ❌ |
| [RemBERT](model_doc/rembert) | ✅ | ✅ | ❌ |
| [ResNet](model_doc/resnet) | ✅ | ✅ | ✅ |
| [RetriBERT](model_doc/retribert) | ✅ | ❌ | ❌ |
Expand Down
102 changes: 102 additions & 0 deletions docs/source/en/model_doc/relation_detr.md
Original file line number Diff line number Diff line change
@@ -0,0 1,102 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.

⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.

-->

# Relation-DETR

## Overview


The Relation-DETR model was proposed in [Relation DETR: Exploring Explicit Position Relation Prior for Object Detection](https://arxiv.org/abs/2407.11699v1) by Xiuquan Hou, Meiqin Liu, Senlin Zhang, Ping Wei, Badong Chen, Xuguang Lan.

Relation DETR is an object detection model that incorporates explicit position relation priors to enhance detection performance. By leveraging the strengths of transformer architectures while integrating spatial relation modeling, Relation DETR achieves superior detection accuracy and fast convergence. This innovative design not only enhances the model's capability to capture complex object interactions but also ensures rapid convergence during training, making it an efficient and high-performance solution for object detection tasks.

The abstract from the paper is the following:

*This paper presents a general scheme for enhancing the convergence and performance of DETR (DEtection TRansformer). We investigate the slow convergence problem in transformers from a new perspective, suggesting that it arises from the self-attention that introduces no structural bias over inputs. To address this issue, we explore incorporating position relation prior as attention bias to augment object detection, following the verification of its statistical significance using a proposed quantitative macroscopic correlation (MC) metric. Our approach, termed Relation-DETR, introduces an encoder to construct position relation embeddings for progressive attention refinement, which further extends the traditional streaming pipeline of DETR into a contrastive relation pipeline to address the conflicts between non-duplicate predictions and positive supervision. Extensive experiments on both generic and task-specific datasets demonstrate the effectiveness of our approach. Under the same configurations, Relation-DETR achieves a significant improvement ( 2.0% AP compared to DINO), state-of-the-art performance (51.7% AP for 1x and 52.1% AP for 2x settings), and a remarkably faster convergence speed (over 40% AP with only 2 training epochs) than existing DETR detectors on COCO val2017. Moreover, the proposed relation encoder serves as a universal plug-in-and-play component, bringing clear improvements for theoretically any DETR-like methods. Furthermore, we introduce a class-agnostic detection dataset, SA-Det-100k. The experimental results on the dataset illustrate that the proposed explicit position relation achieves a clear improvement of 1.3% AP, highlighting its potential towards universal object detection.*

<img src="https://raw.githubusercontent.com/xiuqhou/Relation-DETR/refs/heads/main/images/convergence_curve.png"
alt="drawing" width="600"/>

<small> Performance comparison between Relation-DETR and other DETR methods. Taken from the <a href="https://arxiv.org/abs/2407.11699">original paper.</a> </small>

The model version was contributed by [xiuqhou](https://github.com/xiuqhou). The original code can be found [here](https://github.com/xiuqhou/Relation-DETR/).


## Usage tips

Initially, an image is processed using a pre-trained convolutional neural network, specifically a Resnet variant as referenced in the original code. This network extracts features from the final three layers of the architecture. Following this, a transformer encoder is employed to convert the multi-scale features into a sequential array of image features. Then, a decoder, equipped with auxiliary prediction heads and position relation encoders is used to refine the object queries. This process facilitates the direct generation of bounding boxes, eliminating the need for any additional post-processing to acquire the logits and coordinates for the bounding boxes.

```py
>>> import torch
>>> import requests

>>> from PIL import Image
>>> from transformers import RelationDetrForObjectDetection, RelationDetrImageProcessor

>>> url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
>>> image = Image.open(requests.get(url, stream=True).raw)

>>> image_processor = RelationDetrImageProcessor.from_pretrained("xiuqhou/relation-detr-resnet50")
>>> model = RelationDetrForObjectDetection.from_pretrained("xiuqhou/relation-detr-resnet50")

>>> inputs = image_processor(images=image, return_tensors="pt")

>>> with torch.no_grad():
... outputs = model(**inputs)

>>> results = image_processor.post_process_object_detection(outputs, target_sizes=torch.tensor([(image.height, image.width)]), threshold=0.3)

>>> for result in results:
... for score, label_id, box in zip(result["scores"], result["labels"], result["boxes"]):
... score, label = score.item(), label_id.item()
... box = [round(i, 2) for i in box.tolist()]
... print(f"{model.config.id2label[label]}: {score:.2f} {box}")
cat: 0.96 [343.8, 24.9, 639.52, 371.71]
cat: 0.95 [12.6, 54.34, 316.37, 471.86]
remote: 0.95 [40.09, 73.49, 175.52, 118.06]
remote: 0.90 [333.09, 76.71, 369.77, 187.4]
couch: 0.90 [0.45, 0.53, 640.44, 475.54]
```

## RelationDetrConfig

[[autodoc]] RelationDetrConfig

## RelationDetrResNetConfig

[[autodoc]] RelationDetrResNetConfig

## RelationDetrImageProcessor

[[autodoc]] RelationDetrImageProcessor
- preprocess
- post_process_object_detection

## RelationDetrImageProcessorFast

[[autodoc]] RelationDetrImageProcessorFast
- preprocess
- post_process_object_detection

## RelationDetrModel

[[autodoc]] RelationDetrModel
- forward

## RelationDetrForObjectDetection

[[autodoc]] RelationDetrForObjectDetection
- forward
18 changes: 18 additions & 0 deletions src/transformers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -714,6 714,7 @@
"models.recurrent_gemma": ["RecurrentGemmaConfig"],
"models.reformer": ["ReformerConfig"],
"models.regnet": ["RegNetConfig"],
"models.relation_detr": ["RelationDetrConfig"],
"models.rembert": ["RemBertConfig"],
"models.resnet": ["ResNetConfig"],
"models.roberta": [
Expand Down Expand Up @@ -1250,6 1251,7 @@
_import_structure["models.poolformer"].extend(["PoolFormerFeatureExtractor", "PoolFormerImageProcessor"])
_import_structure["models.pvt"].extend(["PvtImageProcessor"])
_import_structure["models.qwen2_vl"].extend(["Qwen2VLImageProcessor"])
_import_structure["models.relation_detr"].extend(["RelationDetrImageProcessor"])
_import_structure["models.rt_detr"].extend(["RTDetrImageProcessor"])
_import_structure["models.sam"].extend(["SamImageProcessor"])
_import_structure["models.segformer"].extend(["SegformerFeatureExtractor", "SegformerImageProcessor"])
Expand Down Expand Up @@ -1281,6 1283,7 @@
_import_structure["models.deformable_detr"].append("DeformableDetrImageProcessorFast")
_import_structure["models.detr"].append("DetrImageProcessorFast")
_import_structure["models.pixtral"].append("PixtralImageProcessorFast")
_import_structure["models.relation_detr"].append("RelationDetrImageProcessorFast")
_import_structure["models.rt_detr"].append("RTDetrImageProcessorFast")
_import_structure["models.vit"].append("ViTImageProcessorFast")

Expand Down Expand Up @@ -3282,6 3285,13 @@
"RegNetPreTrainedModel",
]
)
_import_structure["models.relation_detr"].extend(
[
"RelationDetrForObjectDetection",
"RelationDetrModel",
"RelationDetrPreTrainedModel",
]
)
_import_structure["models.rembert"].extend(
[
"RemBertForCausalLM",
Expand Down Expand Up @@ -5714,6 5724,7 @@
from .models.recurrent_gemma import RecurrentGemmaConfig
from .models.reformer import ReformerConfig
from .models.regnet import RegNetConfig
from .models.relation_detr import RelationDetrConfig
from .models.rembert import RemBertConfig
from .models.resnet import ResNetConfig
from .models.roberta import (
Expand Down Expand Up @@ -6274,6 6285,7 @@
)
from .models.pvt import PvtImageProcessor
from .models.qwen2_vl import Qwen2VLImageProcessor
from .models.relation_detr import RelationDetrImageProcessor
from .models.rt_detr import RTDetrImageProcessor
from .models.sam import SamImageProcessor
from .models.segformer import SegformerFeatureExtractor, SegformerImageProcessor
Expand Down Expand Up @@ -6301,6 6313,7 @@
from .models.deformable_detr import DeformableDetrImageProcessorFast
from .models.detr import DetrImageProcessorFast
from .models.pixtral import PixtralImageProcessorFast
from .models.relation_detr import RelationDetrImageProcessorFast
from .models.rt_detr import RTDetrImageProcessorFast
from .models.vit import ViTImageProcessorFast

Expand Down Expand Up @@ -7906,6 7919,11 @@
RegNetModel,
RegNetPreTrainedModel,
)
from .models.relation_detr import (
RelationDetrForObjectDetection,
RelationDetrModel,
RelationDetrPreTrainedModel,
)
from .models.rembert import (
RemBertForCausalLM,
RemBertForMaskedLM,
Expand Down
Loading