Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

My results are lower than yours. #22

Open
yuan738 opened this issue May 20, 2023 · 19 comments
Open

My results are lower than yours. #22

yuan738 opened this issue May 20, 2023 · 19 comments

Comments

@yuan738
Copy link

yuan738 commented May 20, 2023

When I train the model with consistent_teacher_r50_fpn_coco_180k_10p_2x8.py on one GPU, the result is too low. And I didn't change the parameters.
`[>>>>>>>>>>>>>>>>>>>>>>>>>>] 5000/5000, 43.4 task/s, elapsed: 115s, ETA: 0s2023-05-20 11:33:56,083 - mmdet.ssod - INFO - Evaluating bbox...
Loading and preparing results...
DONE (t=1.21s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type bbox
DONE (t=17.84s).
Accumulating evaluation results...
DONE (t=5.17s).
2023-05-20 11:34:22,052 - mmdet.ssod - INFO -
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.123
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=1000 ] = 0.197
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=1000 ] = 0.126
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.067
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.142
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.156
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.342
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=300 ] = 0.342
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=1000 ] = 0.342
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.146
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.359
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.496

[>>>>>>>>>>>>>>>>>>>>>>>>>>] 5000/5000, 43.1 task/s, elapsed: 116s, ETA: 0s2023-05-20 11:36:22,808 - mmdet.ssod - INFO - Evaluating bbox...
Loading and preparing results...
DONE (t=1.24s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type bbox
DONE (t=15.87s).
Accumulating evaluation results...
DONE (t=6.67s).
2023-05-20 11:36:48,355 - mmdet.ssod - INFO -
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.098
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=1000 ] = 0.165
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=1000 ] = 0.099
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.051
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.113
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.125
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.305
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=300 ] = 0.305
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=1000 ] = 0.305
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.131
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.315
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.446

2023-05-20 11:36:48,895 - mmdet.ssod - INFO - Exp name: consistent_teacher_r50_fpn_coco_180k_10p_2x8.py
2023-05-20 11:36:48,898 - mmdet.ssod - INFO - Iter(val) [180000] teacher.bbox_mAP: 0.1230, teacher.bbox_mAP_50: 0.1971, teacher.bbox_mAP_75: 0.1262, teacher.bbox_mAP_s: 0.0674, teacher.bbox_mAP_m: 0.1415, teacher.bbox_mAP_l: 0.1562, teacher.bbox_mAP_copypaste: 0.1230 0.1971 0.1262 0.0674 0.1415 0.1562, student.bbox_mAP: 0.0984, student.bbox_mAP_50: 0.1653, student.bbox_mAP_75: 0.0991, student.bbox_mAP_s: 0.0513, student.bbox_mAP_m: 0.1129, student.bbox_mAP_l: 0.1249, student.bbox_mAP_copypaste: 0.0984 0.1653 0.0991 0.0513 0.1129 0.1249
wandb: Waiting for W&B process to finish... (success).`

@Johnson-Wang
Copy link
Collaborator

Sorry, this config is supposed to be run on 8 GPUs with 2x8 meaning samples_per_gpu=2 and the total number of GPUs = 8.

@yuan738
Copy link
Author

yuan738 commented May 20, 2023

Thanks, is there any config supposed to be run on one GPU? Or, what parameters should I changed in the config?
I have changed the parameter —gups and --gpu-ids in the tools/train.py to set the one GPU.
Thanks!

@Adamdad
Copy link
Owner

Adamdad commented May 24, 2023

Dear @yuan738,

Thank you for your insightful question. Currently, the semi-supervised method is heavily dependent on a large batch size, and as a result, reducing the number of GPUs could significantly impact performance. Unfortunately, we have not yet found an effective solution to this issue.

One potential workaround could be to implement "gradient accumulation," or you might consider using fp16 to increase the batch size for a single GPU. Moreover, given the limitations of single GPU training, you might find that a 1:4 label-to-unlabel ratio is too ambitious and could consider adjusting it to a 1:1 ratio. However, please note that the performance might still be subpar with a single GPU setup. We acknowledge this challenge and will strive to address it in our future work.

Best Regards,

@zimenglan-sysu-512
Copy link

hi @Adamdad
i train 'consistent_teacher_r50_fpn_coco_180k_10p.py' config using 4 GPUs, at iteration 16000, the result as below

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.113
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=1000 ] = 0.190
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=1000 ] = 0.113
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.063
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.133
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.143
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.323
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=300 ] = 0.323
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=1000 ] = 0.323
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.145
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.346
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.439

it seems the mAP is too low.

@Adamdad
Copy link
Owner

Adamdad commented May 24, 2023

Dear @zimenglan-sysu-512 ,

I recommend increasing the ratio of labeled to unlabeled samples, such as a 1:1 ratio. Currently, the learning rate is based on 8 labeled samples (1 labeled sample per GPU). If you are training with fewer GPUs, you may need to adjust the batch size for labeled samples or decrease the learning rate to match your setup.

It's important to keep in mind that performance may still be lower with a reduced GPU setup. We are aware of this challenge and will make efforts to overcome it in our future endeavors.

Best.

@zimenglan-sysu-512
Copy link

zimenglan-sysu-512 commented May 24, 2023

hi @Adamdad
should i modify data.samples_per_gpu and data.sampler.train.sample_ratio to increase the labeled samples for 4 GPUs training and reduce the learning rate?
e.g.

data.samples_per_gpu=6
data.sampler.train.sample_ratio=[2, 4]
lr = 0.01 * (4 * 6) / (5 * 8)

@Adamdad
Copy link
Owner

Adamdad commented May 24, 2023

Dear @zimenglan-sysu-512,

Yes, setting data.sampler.train.sample_ratio to [2, 4] or [3, 3] should work well. The first number represents the number of labeled samples, while the second number represents the number of unlabeled samples. Therefore, you will be using 2 or 3 labeled samples per batch per GPU. Make sure

data.samples_per_gpu = sum(data.sampler.train.sample_ratio).

Best,

@zimenglan-sysu-512
Copy link

zimenglan-sysu-512 commented May 25, 2023

hi @Adamdad
image
it seems that this config file is not found ..

@Adamdad
Copy link
Owner

Adamdad commented May 25, 2023

We only has a file called configs/consistent-teacher/consistent_teacher_r50_fpn_coco_720k_fulldata.py. No config provided for 360k training for full data.

@zimenglan-sysu-512
Copy link

We only has a file called configs/consistent-teacher/consistent_teacher_r50_fpn_coco_720k_fulldata.py. No config provided for 360k training for full data.

thanks, another question, where to find configs/consistent-teacher/base.py file?

@Adamdad
Copy link
Owner

Adamdad commented May 25, 2023

Dear @zimenglan-sysu-512,

Apologies for any confusion caused. I wanted to inform you that the file configs/consistent-teacher/base.py has been renamed to configs/consistent-teacher/consistent_teacher_r50_fpn_coco_180k_10p.py. I have also updated this change in the README.

Best regards,

@zimenglan-sysu-512
Copy link

hi @Adamdad
two questions here:

  1. from the log file, learning rate is keeping the same through the training phrase, why?
  2. what is the difference between the 36w and 72 iterations? how about the performance of mAP? since only have few gpus e.g. 4, training 72w iterations takes more than 10 days to finishing even using fp16.

@Adamdad
Copy link
Owner

Adamdad commented May 26, 2023

Dear @zimenglan-sysu-512,

  1. In our experiment, we decided not to decay the learning rate. Surprisingly, we observed that using a fixed learning rate resulted in higher performance compared to using a learning rate decay scheduling.
  2. Unfortunately, we did not conduct the 360k iteration experiments, so we cannot provide insights into the performance gap between the two training times. However, it's worth noting that training the model for 72k iterations is indeed quite time-consuming. Even with the utilization of 8xV100 GPUs, it still takes several days to complete.

If you have any further questions or need additional information, please let me know.

Best regards,

@zimenglan-sysu-512
Copy link

Dear @zimenglan-sysu-512,

Yes, setting data.sampler.train.sample_ratio to [2, 4] or [3, 3] should work well. The first number represents the number of labeled samples, while the second number represents the number of unlabeled samples. Therefore, you will be using 2 or 3 labeled samples per batch per GPU. Make sure

data.samples_per_gpu = sum(data.sampler.train.sample_ratio).

Best,

using 4 GPUs, and batch size is 6 in which sample_ratio = [2, 4], the result is below:

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.381
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=1000 ] = 0.542
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=1000 ] = 0.409
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.211
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.411
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.484
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.570
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=300 ] = 0.570
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=1000 ] = 0.570
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.343
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.609
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.740

@Adamdad
Copy link
Owner

Adamdad commented May 28, 2023

Dear @zimenglan-sysu-512
The results you provided are amazing. It would be very helpful if you could share the config and checkpoint. This experiment could be extremely useful for people with less GPUs. You can start a pull request or send the file to me.

Great 😃

@zimenglan-sysu-512
Copy link

Dear @zimenglan-sysu-512 The results you provided are amazing. It would be very helpful if you could share the config and checkpoint. This experiment could be extremely useful for people with less GPUs. You can start a pull request or send the file to me.

Great 😃
hi @Adamdad , i will send the config, log and checkpoint files to u. please check the qq email.

@zimenglan-sysu-512
Copy link

hi @Adamdad
why in the fulldata config, the backbone r50 freeze the BN layer, but the 10% config does not ?

@cyn-liu
Copy link

cyn-liu commented Jan 11, 2024

dear @zimenglan-sysu-512
could you share your config file with me? It will be of great help to me, I only have 2 GPUs and get a bad training results.
my qq mail is [email protected]

@xiaofu3322
Copy link

dear @zimenglan-sysu-512
could you share your config file with me? It will be of great help to me, my qq mail is [email protected]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants