My results are lower than yours. #22

yuan738 · 2023-05-20T03:38:01Z

When I train the model with consistent_teacher_r50_fpn_coco_180k_10p_2x8.py on one GPU, the result is too low. And I didn't change the parameters.
`[>>>>>>>>>>>>>>>>>>>>>>>>>>] 5000/5000, 43.4 task/s, elapsed: 115s, ETA: 0s2023-05-20 11:33:56,083 - mmdet.ssod - INFO - Evaluating bbox...
Loading and preparing results...
DONE (t=1.21s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type bbox
DONE (t=17.84s).
Accumulating evaluation results...
DONE (t=5.17s).
2023-05-20 11:34:22,052 - mmdet.ssod - INFO -
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.123
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=1000 ] = 0.197
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=1000 ] = 0.126
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.067
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.142
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.156
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.342
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=300 ] = 0.342
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=1000 ] = 0.342
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.146
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.359
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.496

[>>>>>>>>>>>>>>>>>>>>>>>>>>] 5000/5000, 43.1 task/s, elapsed: 116s, ETA: 0s2023-05-20 11:36:22,808 - mmdet.ssod - INFO - Evaluating bbox...
Loading and preparing results...
DONE (t=1.24s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type bbox
DONE (t=15.87s).
Accumulating evaluation results...
DONE (t=6.67s).
2023-05-20 11:36:48,355 - mmdet.ssod - INFO -
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.098
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=1000 ] = 0.165
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=1000 ] = 0.099
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.051
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.113
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.125
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.305
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=300 ] = 0.305
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=1000 ] = 0.305
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.131
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.315
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.446

2023-05-20 11:36:48,895 - mmdet.ssod - INFO - Exp name: consistent_teacher_r50_fpn_coco_180k_10p_2x8.py
2023-05-20 11:36:48,898 - mmdet.ssod - INFO - Iter(val) [180000] teacher.bbox_mAP: 0.1230, teacher.bbox_mAP_50: 0.1971, teacher.bbox_mAP_75: 0.1262, teacher.bbox_mAP_s: 0.0674, teacher.bbox_mAP_m: 0.1415, teacher.bbox_mAP_l: 0.1562, teacher.bbox_mAP_copypaste: 0.1230 0.1971 0.1262 0.0674 0.1415 0.1562, student.bbox_mAP: 0.0984, student.bbox_mAP_50: 0.1653, student.bbox_mAP_75: 0.0991, student.bbox_mAP_s: 0.0513, student.bbox_mAP_m: 0.1129, student.bbox_mAP_l: 0.1249, student.bbox_mAP_copypaste: 0.0984 0.1653 0.0991 0.0513 0.1129 0.1249
wandb: Waiting for W&B process to finish... (success).`

Johnson-Wang · 2023-05-20T03:43:03Z

Sorry, this config is supposed to be run on 8 GPUs with 2x8 meaning samples_per_gpu=2 and the total number of GPUs = 8.

yuan738 · 2023-05-20T03:57:19Z

Thanks, is there any config supposed to be run on one GPU? Or, what parameters should I changed in the config?
I have changed the parameter —gups and --gpu-ids in the tools/train.py to set the one GPU.
Thanks!

Adamdad · 2023-05-24T02:57:41Z

Dear @yuan738,

Thank you for your insightful question. Currently, the semi-supervised method is heavily dependent on a large batch size, and as a result, reducing the number of GPUs could significantly impact performance. Unfortunately, we have not yet found an effective solution to this issue.

One potential workaround could be to implement "gradient accumulation," or you might consider using fp16 to increase the batch size for a single GPU. Moreover, given the limitations of single GPU training, you might find that a 1:4 label-to-unlabel ratio is too ambitious and could consider adjusting it to a 1:1 ratio. However, please note that the performance might still be subpar with a single GPU setup. We acknowledge this challenge and will strive to address it in our future work.

Best Regards,

zimenglan-sysu-512 · 2023-05-24T03:44:03Z

hi @Adamdad
i train 'consistent_teacher_r50_fpn_coco_180k_10p.py' config using 4 GPUs, at iteration 16000, the result as below

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.113
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=1000 ] = 0.190
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=1000 ] = 0.113
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.063
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.133
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.143
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.323
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=300 ] = 0.323
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=1000 ] = 0.323
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.145
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.346
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.439

it seems the mAP is too low.

Adamdad · 2023-05-24T03:52:04Z

Dear @zimenglan-sysu-512 ,

I recommend increasing the ratio of labeled to unlabeled samples, such as a 1:1 ratio. Currently, the learning rate is based on 8 labeled samples (1 labeled sample per GPU). If you are training with fewer GPUs, you may need to adjust the batch size for labeled samples or decrease the learning rate to match your setup.

It's important to keep in mind that performance may still be lower with a reduced GPU setup. We are aware of this challenge and will make efforts to overcome it in our future endeavors.

Best.

zimenglan-sysu-512 · 2023-05-24T04:08:19Z

hi @Adamdad
should i modify data.samples_per_gpu and data.sampler.train.sample_ratio to increase the labeled samples for 4 GPUs training and reduce the learning rate?
e.g.

data.samples_per_gpu=6
data.sampler.train.sample_ratio=[2, 4]
lr = 0.01 * (4 * 6) / (5 * 8)

Adamdad · 2023-05-24T04:14:22Z

Dear @zimenglan-sysu-512,

Yes, setting data.sampler.train.sample_ratio to [2, 4] or [3, 3] should work well. The first number represents the number of labeled samples, while the second number represents the number of unlabeled samples. Therefore, you will be using 2 or 3 labeled samples per batch per GPU. Make sure

data.samples_per_gpu = sum(data.sampler.train.sample_ratio).

Best,

zimenglan-sysu-512 · 2023-05-25T04:01:52Z

hi @Adamdad

it seems that this config file is not found ..

Adamdad · 2023-05-25T04:05:57Z

We only has a file called configs/consistent-teacher/consistent_teacher_r50_fpn_coco_720k_fulldata.py. No config provided for 360k training for full data.

zimenglan-sysu-512 · 2023-05-25T07:23:36Z

We only has a file called configs/consistent-teacher/consistent_teacher_r50_fpn_coco_720k_fulldata.py. No config provided for 360k training for full data.

thanks, another question, where to find configs/consistent-teacher/base.py file?

Adamdad · 2023-05-25T07:28:24Z

Dear @zimenglan-sysu-512,

Apologies for any confusion caused. I wanted to inform you that the file configs/consistent-teacher/base.py has been renamed to configs/consistent-teacher/consistent_teacher_r50_fpn_coco_180k_10p.py. I have also updated this change in the README.

Best regards,

zimenglan-sysu-512 · 2023-05-26T01:49:18Z

hi @Adamdad
two questions here:

from the log file, learning rate is keeping the same through the training phrase, why?
what is the difference between the 36w and 72 iterations? how about the performance of mAP? since only have few gpus e.g. 4, training 72w iterations takes more than 10 days to finishing even using fp16.

Adamdad · 2023-05-26T07:31:06Z

Dear @zimenglan-sysu-512,

In our experiment, we decided not to decay the learning rate. Surprisingly, we observed that using a fixed learning rate resulted in higher performance compared to using a learning rate decay scheduling.
Unfortunately, we did not conduct the 360k iteration experiments, so we cannot provide insights into the performance gap between the two training times. However, it's worth noting that training the model for 72k iterations is indeed quite time-consuming. Even with the utilization of 8xV100 GPUs, it still takes several days to complete.

If you have any further questions or need additional information, please let me know.

Best regards,

zimenglan-sysu-512 · 2023-05-28T12:00:37Z

Dear @zimenglan-sysu-512,

Yes, setting data.sampler.train.sample_ratio to [2, 4] or [3, 3] should work well. The first number represents the number of labeled samples, while the second number represents the number of unlabeled samples. Therefore, you will be using 2 or 3 labeled samples per batch per GPU. Make sure

data.samples_per_gpu = sum(data.sampler.train.sample_ratio).

Best,

using 4 GPUs, and batch size is 6 in which sample_ratio = [2, 4], the result is below:

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.381
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=1000 ] = 0.542
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=1000 ] = 0.409
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.211
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.411
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.484
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.570
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=300 ] = 0.570
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=1000 ] = 0.570
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.343
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.609
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.740

Adamdad · 2023-05-28T12:12:03Z

Dear @zimenglan-sysu-512
The results you provided are amazing. It would be very helpful if you could share the config and checkpoint. This experiment could be extremely useful for people with less GPUs. You can start a pull request or send the file to me.

Great 😃

zimenglan-sysu-512 · 2023-05-29T05:08:31Z

Dear @zimenglan-sysu-512 The results you provided are amazing. It would be very helpful if you could share the config and checkpoint. This experiment could be extremely useful for people with less GPUs. You can start a pull request or send the file to me.

Great 😃
hi @Adamdad , i will send the config, log and checkpoint files to u. please check the qq email.

zimenglan-sysu-512 · 2023-05-30T06:33:18Z

hi @Adamdad
why in the fulldata config, the backbone r50 freeze the BN layer, but the 10% config does not ?

cyn-liu · 2024-01-11T03:03:21Z

dear @zimenglan-sysu-512
could you share your config file with me? It will be of great help to me, I only have 2 GPUs and get a bad training results.
my qq mail is [email protected]

xiaofu3322 · 2024-05-22T08:39:21Z

dear @zimenglan-sysu-512
could you share your config file with me? It will be of great help to me, my qq mail is [email protected]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

My results are lower than yours. #22

My results are lower than yours. #22

yuan738 commented May 20, 2023

Johnson-Wang commented May 20, 2023

yuan738 commented May 20, 2023

Adamdad commented May 24, 2023

zimenglan-sysu-512 commented May 24, 2023

Adamdad commented May 24, 2023

zimenglan-sysu-512 commented May 24, 2023 •

edited

Loading

Adamdad commented May 24, 2023

zimenglan-sysu-512 commented May 25, 2023 •

edited

Loading

Adamdad commented May 25, 2023

zimenglan-sysu-512 commented May 25, 2023

Adamdad commented May 25, 2023 •

edited

Loading

zimenglan-sysu-512 commented May 26, 2023

Adamdad commented May 26, 2023

zimenglan-sysu-512 commented May 28, 2023

Adamdad commented May 28, 2023

zimenglan-sysu-512 commented May 29, 2023

zimenglan-sysu-512 commented May 30, 2023

cyn-liu commented Jan 11, 2024

xiaofu3322 commented May 22, 2024

My results are lower than yours. #22

My results are lower than yours. #22

Comments

yuan738 commented May 20, 2023

Johnson-Wang commented May 20, 2023

yuan738 commented May 20, 2023

Adamdad commented May 24, 2023

zimenglan-sysu-512 commented May 24, 2023

Adamdad commented May 24, 2023

zimenglan-sysu-512 commented May 24, 2023 • edited Loading

Adamdad commented May 24, 2023

zimenglan-sysu-512 commented May 25, 2023 • edited Loading

Adamdad commented May 25, 2023

zimenglan-sysu-512 commented May 25, 2023

Adamdad commented May 25, 2023 • edited Loading

zimenglan-sysu-512 commented May 26, 2023

Adamdad commented May 26, 2023

zimenglan-sysu-512 commented May 28, 2023

Adamdad commented May 28, 2023

zimenglan-sysu-512 commented May 29, 2023

zimenglan-sysu-512 commented May 30, 2023

cyn-liu commented Jan 11, 2024

xiaofu3322 commented May 22, 2024

zimenglan-sysu-512 commented May 24, 2023 •

edited

Loading

zimenglan-sysu-512 commented May 25, 2023 •

edited

Loading

Adamdad commented May 25, 2023 •

edited

Loading