Add Relation DETR #34900

xiuqhou · 2024-11-24T14:08:34Z

What does this PR do?

This PR adds Relation-DETR as introduced in Relation DETR: Exploring Explicit Position Relation Prior for Object Detection. Checkpoint for Relation-DETR (ResNet50) converted from original repo https://github.com/xiuqhou/Relation-DETR has been uploaded to https://huggingface.co/xiuqhou/relation-detr-resnet50

Related issues in original repo:
xiuqhou/Relation-DETR#25
xiuqhou/Relation-DETR#21

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

TODO:

Make more checkpoints with Swin-L and Focal-L backbones available on HF.
Update the document about Relation-DETR.

Who can review?

@amyeroberts @qubvel

qubvel · 2024-11-25T12:00:53Z

Hi @xiuqhou! Congratulations on the paper, awesome work! And thanks for working on transformers implementation! Feel free to ping me when it's ready for review or if you have any questions!

xiuqhou · 2024-11-28T07:55:49Z

Hi @qubvel Thanks for your support! The code is now ready for review—I'd greatly appreciate it if you could take a look and share your feedback. Please let me know if there’s anything that needs improvement.

docs/source/en/model_doc/relation_detr.md

src/transformers/loss/loss_relation_detr.py

src/transformers/models/auto/configuration_auto.py

src/transformers/models/relation_detr/__init__.py

src/transformers/models/relation_detr/configuration_relation_detr.py

src/transformers/models/relation_detr/image_processing_relation_detr.py

Co-authored-by: Pavel Iakubovskii <[email protected]>

xiuqhou · 2024-12-28T08:00:24Z

Hi @qubvel , thank you very much for your careful review! 🤗 I have updated the code accordingly, and left some responses in your comments to clarify some questions. Please let me know if there is anything that needs to be improved.

qubvel · 2025-01-10T14:22:37Z

Hi @xiuqhou, Happy New Year! Thanks for addressing the comments, and sorry for the delay. I'm going to review it right now. 🤗

qubvel · 2025-01-10T14:56:25Z

src/transformers/models/relation_detr/image_processing_relation_detr.py

+        # prepare (COCO annotations as a list of Dict -> DETR target as a single Dict per image)
+        if annotations is not None:
+            prepared_images = []
+            prepared_annotations = []
+            for image, target in zip(images, annotations):
+                target = self.prepare_annotation(
+                    image,
+                    target,
+                    format,
+                    input_data_format=input_data_format,
+                )
+                prepared_images.append(image)
+                prepared_annotations.append(target)
+            images = prepared_images
+            annotations = prepared_annotations
+            del prepared_images, prepared_annotations
+
+        # transformations
+        if do_resize:
+            if annotations is not None:
+                resized_images, resized_annotations = [], []
+                for image, target in zip(images, annotations):
+                    orig_size = get_image_size(image, input_data_format)
+                    resized_image = self.resize(
+                        image, size=size, resample=resample, input_data_format=input_data_format
+                    )
+                    resized_annotation = self.resize_annotation(
+                        target, orig_size, get_image_size(resized_image, input_data_format)
+                    )
+                    resized_images.append(resized_image)
+                    resized_annotations.append(resized_annotation)
+                images = resized_images
+                annotations = resized_annotations
+                del resized_images, resized_annotations
+            else:
+                images = [
+                    self.resize(image, size=size, resample=resample, input_data_format=input_data_format)
+                    for image in images
+                ]
+
+        if do_rescale:
+            images = [self.rescale(image, rescale_factor, input_data_format=input_data_format) for image in images]
+
+        if do_normalize:
+            images = [
+                self.normalize(image, image_mean, image_std, input_data_format=input_data_format) for image in images
+            ]
+
+        if do_convert_annotations and annotations is not None:
+            annotations = [
+                self.normalize_annotation(annotation, get_image_size(image, input_data_format))
+                for annotation, image in zip(annotations, images)
+            ]


Lets put this to a single loop:

processed_images = [] processed_annotations = [] for i, image in enumerate(images): annotation = annotations[i] if annotations is not None else None if resize: image = self.resize(...) if annotation is not None: annotation = ... ... processed_images.append(image) if annotation is not None: processed_annotations.append(annotation)

qubvel

Thanks, I did another round of review! It looks clean. There are mostly small comments on my side, please see them below. Next time gonna try fine-tuning and if everything good pass it to core-maintainers approval to be merged 🤗

qubvel · 2025-01-10T15:06:12Z

src/transformers/models/relation_detr/configuration_relation_detr.py

+            backbone_kwargs=backbone_kwargs,
+        )
+
+        assert backbone_features_format in ["channels_first", "channels_last"], (


Lets use raise instead of assert

qubvel · 2025-01-10T15:07:20Z

src/transformers/models/relation_detr/modeling_relation_detr.py

+if is_timm_available():
+    pass


qubvel · 2025-01-10T15:10:02Z

src/transformers/models/relation_detr/modeling_relation_detr.py

+    init_reference_points: torch.FloatTensor = None
+    dec_outputs_class: torch.FloatTensor = None
+    dec_outputs_coord: torch.FloatTensor = None
+    enc_outputs_class: torch.FloatTensor = None
+    enc_outputs_coord: torch.FloatTensor = None
+    last_hidden_state: torch.FloatTensor = None
+    intermediate_hidden_states: torch.FloatTensor = None
+    intermediate_reference_points: torch.FloatTensor = None
+    decoder_hidden_states: Optional[Tuple[torch.FloatTensor]] = None
+    decoder_attentions: Optional[Tuple[torch.FloatTensor]] = None
+    cross_attentions: Optional[Tuple[torch.FloatTensor]] = None


Can we uae order similar to other models? like last_hidden_state should be the first

qubvel · 2025-01-10T15:12:08Z

src/transformers/models/relation_detr/modeling_relation_detr.py

+        super().__init__()
+        self.in_channels = in_channels
+        self.post_layer_norm = post_layer_norm
+        if self.post_layer_norm:


Do we have pretrained checkpoints for both cases? Am I right that if post_layer_norm=False, this module will do nothing or just transpose? Let's ust skip this layer instead in parent module then

qubvel · 2025-01-10T15:14:01Z

src/transformers/models/relation_detr/modeling_relation_detr.py

+        if self.post_layer_norm:
+            if self.backbone_features_format == "channels_first":
+                # convert (batch_size, channels, height, width) -> (batch_size, height, width, channels)
+                multi_level_feats = [feat.permute(0, 2, 3, 1) for feat in multi_level_feats]
+
+                for idx, feat in enumerate(multi_level_feats):
+                    multi_level_feats[idx] = self.norms[idx](feat)
+
+                # convert (batch_size, height, width, channels) -> (batch_size, channels, height, width)
+                multi_level_feats = [feat.permute(0, 3, 1, 2) for feat in multi_level_feats]
+            else:
+                for idx, feat in enumerate(multi_level_feats):
+                    multi_level_feats[idx] = self.norms[idx](feat)
+


Suggested change

if self.post_layer_norm:

if self.backbone_features_format == "channels_first":

# convert (batch_size, channels, height, width) -> (batch_size, height, width, channels)

multi_level_feats = [feat.permute(0, 2, 3, 1) for feat in multi_level_feats]

for idx, feat in enumerate(multi_level_feats):

multi_level_feats[idx] = self.norms[idx](feat)

# convert (batch_size, height, width, channels) -> (batch_size, channels, height, width)

multi_level_feats = [feat.permute(0, 3, 1, 2) for feat in multi_level_feats]

else:

for idx, feat in enumerate(multi_level_feats):

multi_level_feats[idx] = self.norms[idx](feat)

if self.post_layer_norm and self.backbone_features_format == "channels_first":

# convert (batch_size, channels, height, width) -> (batch_size, height, width, channels)

multi_level_feats = [feat.permute(0, 2, 3, 1) for feat in multi_level_feats]

for idx, feat in enumerate(multi_level_feats):

multi_level_feats[idx] = self.norms[idx](feat)

# convert (batch_size, height, width, channels) -> (batch_size, channels, height, width)

multi_level_feats = [feat.permute(0, 3, 1, 2) for feat in multi_level_feats]

elif self.post_layer_norm:

for idx, feat in enumerate(multi_level_feats):

multi_level_feats[idx] = self.norms[idx](feat)

qubvel · 2025-01-10T17:50:57Z

src/transformers/models/relation_detr/modeling_relation_detr.py

+    # When using clones, all layers > 0 will be clones, but layer 0 *is* required
+    # _tied_weights_keys = [r"bbox_head\.[1-9]\d*", r"class_head\.[1-9]\d*"]
+    # We can't initialize the model on meta device as some weights are modified during the initialization
+    _no_split_modules = None


please specify layers, it should be aly layer class names that contain residual connectons, like *EncoderLayer, *DecoderLayer

qubvel · 2025-01-10T17:52:32Z

tests/models/relation_detr/test_modeling_relation_detr.py

@@ -0,0  1,741 @@
+# coding=utf-8
+# Copyright 2022 The HuggingFace Inc. team. All rights reserved.


or maybe 2025 already

Suggested change

# Copyright 2022 The HuggingFace Inc. team. All rights reserved.

# Copyright 2024 The HuggingFace Inc. team. All rights reserved.

qubvel · 2025-01-10T17:56:21Z

tests/models/relation_detr/test_modeling_relation_detr.py

Let also add # Copied from for non modified tests

qubvel · 2025-01-10T17:56:33Z

tests/models/relation_detr/test_image_processing_relation_detr.py

@@ -0,0  1,485 @@
+# coding=utf-8
+# Copyright 2022 HuggingFace Inc.


Suggested change

# Copyright 2022 HuggingFace Inc.

# Copyright 2024 HuggingFace Inc.

qubvel · 2025-01-10T18:01:29Z

src/transformers/models/relation_detr/modeling_relation_detr.py

+
+

We probably need head initialization for finetuning? see rt-detr's _init_weights for example

xiuqhou changed the title ~~Add Relation DETR~~ [WIP] Add Relation DETR Nov 24, 2024

xiuqhou marked this pull request as draft November 24, 2024 15:16

xiuqhou force-pushed the add_relation_detr branch from 7867221 to 1f0465c Compare November 25, 2024 07:59

xiuqhou marked this pull request as ready for review November 25, 2024 08:37

xiuqhou changed the title ~~[WIP] Add Relation DETR~~ Add Relation DETR Nov 25, 2024

xiuqhou force-pushed the add_relation_detr branch from 37959ac to d114fc7 Compare November 25, 2024 08:53

qubvel added New model Vision run-slow labels Nov 25, 2024

xiuqhou force-pushed the add_relation_detr branch 5 times, most recently from 14308cf to ce63725 Compare November 28, 2024 07:42

qubvel self-requested a review November 28, 2024 09:19