Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stencil blocking may have foobarred performance... #261

Open
jeffhammond opened this issue Aug 16, 2017 · 8 comments
Open

stencil blocking may have foobarred performance... #261

jeffhammond opened this issue Aug 16, 2017 · 8 comments
Assignees
Labels

Comments

@jeffhammond
Copy link
Member

Need to investigate but the recent commits have shown a massive regression in some cases.

@jeffhammond jeffhammond added the C label Aug 16, 2017
@jeffhammond jeffhammond self-assigned this Aug 16, 2017
@rfvander
Copy link
Contributor

That's strange. I used to have tiling for all three stencil implementations, in the good old days of SERIAL, MPI, and OpenMP. But I hardly ever saw a benefit. and it complicated the code, so I eliminated it for all but the serial implementation. In principle it should allow better reuse, but it takes a LARGE grid to see that happen. If performance drops precipitously because of it, there's a pathology (bug).

@jeffhammond
Copy link
Member Author

This was because omitting the blocking argument meant that measurements used star 2 instead of star 4, but we still have to deal with the fact that huge tile sizes led to inadequate parallelism. We should branch on (grid_size/tile_size)^2<num_threads and not bother tiling there.

@rfvander
Copy link
Contributor

Yes, we saw the same with transpose, as you may recall. But I wouldn't do anything automatic. Users should always be allowed to shoot themselves in the foot.

@rfvander
Copy link
Contributor

But maybe we can warn them of the bullet holes.

@rfvander
Copy link
Contributor

I meant to ask you if you ever get requests for box-shaped stencils (instead of star stencils). For the AMR code I effectively had to support that in MPI (too complicated to explain why, and not worth it), and it was actually very easy. I'd like to add that to our MPI variants.

@jeffhammond
Copy link
Member Author

jeffhammond commented Aug 16, 2017 via email

@jeffhammond
Copy link
Member Author

jeffhammond commented Aug 16, 2017 via email

@jeffhammond
Copy link
Member Author

jeffhammond commented Aug 16, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants