Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Fast cuDNN BatchNorm NHWC kernels support #20615

Merged
merged 4 commits into from
Sep 30, 2021

Conversation

mk-61
Copy link
Contributor

@mk-61 mk-61 commented Sep 27, 2021

Description

This PR makes cuDNN-backed BatchNorm operator use newer API calls (cudnnBatchNormalizationForwardTrainingEx / cudnnBatchNormalizationBackwardEx), which bring in significant speed up in some cases (fp16 NHWC / NDHWC layouts).

I also refactored and simplified code a bit.

I tested fp16 NHWC speedup on my Layout Management feature branch (not up-streamed yet) on ResNet50 model.
The correctness should be covered by existing tests.

Checklist

Essentials

  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage
  • Code is well-documented

Changes

  • Make use of newer cuDNN API calls / new kernels
  • Refactoring

@DickJC123

@mxnet-bot
Copy link

Hey @mk-61 , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

  • To trigger all jobs: @mxnet-bot run ci [all]
  • To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [sanity, windows-cpu, miscellaneous, website, windows-gpu, unix-gpu, centos-cpu, unix-cpu, clang, edge, centos-gpu]


Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

@mseth10 mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-awaiting-testing PR is reviewed and waiting CI build and test labels Sep 27, 2021
@mk-61 mk-61 changed the title Fast cuDNN NHWC kernels support Fast cuDNN BatchNorm NHWC kernels support Sep 27, 2021
@mseth10 mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Sep 27, 2021
@mseth10 mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Sep 28, 2021
@mk-61 mk-61 requested a review from szha as a code owner September 28, 2021 03:34
@mseth10 mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Sep 28, 2021
@mk-61
Copy link
Contributor Author

mk-61 commented Sep 28, 2021

@mxnet-bot run ci [centos-gpu, unix-cpu, website]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [centos-gpu, website, unix-cpu]

@mseth10 mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Sep 28, 2021
@mk-61
Copy link
Contributor Author

mk-61 commented Sep 28, 2021

@mxnet-bot run ci [unix-cpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [unix-cpu]

@mseth10 mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test and removed pr-work-in-progress PR is still work in progress labels Sep 28, 2021
Copy link
Member

@ptrendx ptrendx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, did you also check the performance of NCHW case?

@mk-61
Copy link
Contributor Author

mk-61 commented Sep 28, 2021

LGTM, did you also check the performance of NCHW case?

You mean compared to functions without "Ex" suffix? Not, I haven't, can do if you like me to. Although I think the logic behind "Ex" functions is "make it faster in some case and fallback to the previous implementations otherwise". Specifically, I expected (and verified) speedup in FP16/NHWC, assumed it shouldn't regress in other cases, unless there's a bug, which cuDNN would need to fix.

@ptrendx
Copy link
Member

ptrendx commented Sep 28, 2021

Yeah, it would be good to check that NCHW does not regress.

@mseth10 mseth10 added pr-work-in-progress PR is still work in progress and removed pr-awaiting-testing PR is reviewed and waiting CI build and test labels Sep 28, 2021
@mk-61
Copy link
Contributor Author

mk-61 commented Sep 28, 2021

Yeah, it would be good to check that NCHW does not regress.

Verified on RN50 / Volta - no regressions and the same kernels used, as far as nsys stats show.

@ptrendx
Copy link
Member

ptrendx commented Sep 28, 2021

@mxnet-bot run ci [unix-cpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [unix-cpu]

@mseth10 mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-awaiting-merge Review and CI is complete. Ready to Merge and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Sep 28, 2021
@ptrendx ptrendx merged commit 23af413 into apache:master Sep 30, 2021
@ptrendx
Copy link
Member

ptrendx commented Sep 30, 2021

Thanks for the contribution!

chinakook pushed a commit to chinakook/mxnet that referenced this pull request Feb 11, 2022
* Fast cuDNN NHWC kernels support

* Fix lint errors

* Get rid of a warning

* Remove CuDNNBatchNorm from AMP lists

Co-authored-by: Vladimir Cherepanov <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr-awaiting-merge Review and CI is complete. Ready to Merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants