Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed drop when running oneDNN in a subthread #1997

Open
matiaslin opened this issue Jul 16, 2024 · 9 comments
Open

Speed drop when running oneDNN in a subthread #1997

matiaslin opened this issue Jul 16, 2024 · 9 comments
Assignees
Labels
platform:x64 Intel64/AMD64 processors sighting Suspicious library behavior. Should be promoted to a bug when confirmed

Comments

@matiaslin
Copy link

matiaslin commented Jul 16, 2024

Summary

Speed drop when running oneDNN in a subthread.

Version

oneDNN 3.4.2 with GNU OpenMP (4.5)

Environment

  • CPU: Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
  • OS version: 3.10.0-1160.88.1.el7.x86_64 #1 SMP Tue Mar 7 15:41:52 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
  • Compiler version: 11.3.0

Steps to reproduce

See attached main.cpp.
main.cpp.txt

Observed behavior

We observe a ~13% speed drop when we perform a oneDNN matmul in a subthread.

                                  matmul (ms)      matmul async (ms)
(30000, 512) * (512, 2)            0.4086            0.4505

Expected behavior

We would expect a similar speed when running in a subthread.

@matiaslin matiaslin added the sighting Suspicious library behavior. Should be promoted to a bug when confirmed label Jul 16, 2024
@dzarukin
Copy link
Contributor

Hi @matiaslin/ From the reproducer shared, it seems that observations based on a single time measurement. Have you tried to perform multiple runs to stabilize performance numbers?

I can suggest to reuse this example, build an asynchronous execution and verify using the proposed methodology. Thanks.

@matiaslin
Copy link
Author

Thanks for the recommendation, @dzarukin. I believe I do execute the matmul primitive multiple times to obtain the results posted above. In the TIME macro, we repeat the expression REPEAT times. Is this what you are referring to?

@dzarukin
Copy link
Contributor

it is, yes. It seems I filtered macro-style variable out... Couple more questions: why do you benchmark creation execution, is it intended? What would be the proof that unintended timing is coming from the library and not from std::async overhead which would spend time to submit the code and then execute it?

Just to be sure - is it a single threaded version?

@WilliamTambellini
Copy link
Contributor

WilliamTambellini commented Jul 17, 2024

why do you benchmark creation execution, is it intended?

we dont: see that the TIME() macro only measure time for execution

What would be the proof that unintended timing is coming from the library and not from std::async overhead which would spend time to submit the code and then execute it?

because the TIME macro only repeat and measure the prim exec.
Best

@dzarukin
Copy link
Contributor

Please attach the ONEDNN_VERBOSE=1 logs with 20 iteration for both modes.

@matiaslin
Copy link
Author

I ran with ONEDNN_VERBOSE=1 and REPEAT is set to 20 for both modes. See attached verbose.log. Thank you!
verbose.log

@dzarukin
Copy link
Contributor

As oneDNN execution doesn't change for two different modes, and given there are 8 threads, I would expect the async mode introduces a resource issue and eventually over-subscription or something along the lines due to std threading is OMP-unaware.

I'd expect those numbers to be aligned once oneDNN is built with a sequential runtime (with some potential delta due to async overhead).

@WilliamTambellini
Copy link
Contributor

Tks @dzarukin
We ll test with the SEQ build of 1dnn but our goal is though to have a normal speed when using 1dnn OMP in a subthread.
Could you please run that main.cpp via Intel vtune and upload somewhere the results ? We would do it but these Xeon(R) Platinum 8375C CPUs are on the cloud meaning under an hypervisor which prevents to get meaningful reports from vtune.
Best

@vpirogov
Copy link
Member

@onednnsupporttriage, would you be able to help with this request?

@shu1chen shu1chen added the platform:x64 Intel64/AMD64 processors label Jul 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
platform:x64 Intel64/AMD64 processors sighting Suspicious library behavior. Should be promoted to a bug when confirmed
Projects
None yet
Development

No branches or pull requests

7 participants