Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid leaking submitted command encoders #5141

Merged
merged 3 commits into from
Jun 24, 2024
Merged

Conversation

nical
Copy link
Contributor

@nical nical commented Jan 26, 2024

Connections

Part of #4660 got accidentally clobbered by arcanization

Description

unlike wgpu-core, wgpu consumes command buffers during submit so it needs to drop them

Checklist

  • Run cargo fmt.
  • Run cargo clippy.
  • Run cargo xtask test to run tests.
  • Add change to CHANGELOG.md. See simple instructions inside file.

@nical nical requested a review from a team as a code owner January 26, 2024 08:16
Copy link
Member

@cwfitzgerald cwfitzgerald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

@cwfitzgerald cwfitzgerald enabled auto-merge (squash) January 26, 2024 08:29
@nical
Copy link
Contributor Author

nical commented Jan 26, 2024

It breaks the multithreaded_compute test.

@nical
Copy link
Contributor Author

nical commented Jan 26, 2024

I suspect that the error is that multithreaded_compute is racing to create command buffers in the same slot:

[2024-01-26T09:34:35Z TRACE wgpu_core::storage] User is inserting CommandBufferId(0,1,gl)  <---
[2024-01-26T09:34:35Z TRACE wgpu_core::storage] User is inserting CommandBufferId(0,2,gl)  <---
[2024-01-26T09:34:35Z TRACE wgpu_core::storage] User is inserting CommandBufferId(1,1,gl)
[2024-01-26T09:34:35Z TRACE wgpu_core::storage] User is inserting CommandBufferId(2,1,gl)
[2024-01-26T09:34:35Z TRACE wgpu_core::storage] User is inserting CommandBufferId(3,1,gl)
[2024-01-26T09:34:35Z TRACE wgpu_core::storage] User is inserting CommandBufferId(4,1,gl)
[2024-01-26T09:34:35Z TRACE wgpu_core::storage] User is inserting CommandBufferId(5,1,gl)
[2024-01-26T09:34:35Z TRACE wgpu_core::storage] User is inserting CommandBufferId(6,1,gl)
[2024-01-26T09:34:35Z TRACE wgpu_core::storage] User is removing CommandBufferId(0,1,gl) <---
[2024-01-26T09:34:35Z TRACE wgpu_core::storage] User is removing CommandBufferId(0,2,gl) <--- Boom.

@nical
Copy link
Contributor Author

nical commented Jan 26, 2024

What's happening is:

  • the command buffer is used in a submission,
    • it snatches it from the registry and leaves its spot with an occupied error state.
    • the command buffer's ResourceInfo gets dropped which frees the ID too soon since the registry still has a spot for the command buffer and we haven't dropped the command buffer.
  • After that a variety of things can go wrong depending on scheduling, but the one I'm observing most often is a new command buffer gets placed in the yet-to-be-dropped spot in the registry.

So we have to ensure that the ResourceInfo doesn't get dropped before we've had a chance to call command_buffer_drop and clear out the spot in the registry.

@nical
Copy link
Contributor Author

nical commented Jan 26, 2024

Unfortunately, trying to keep an Arc to the resource in the Destroyed variant of the registry causes CommandBuffer::from_arc_into_baked to panic because it expects to have the only arc left to the command buffer. There's a tension between this code using the reference counting to signify actual usage of the resource and the ResourceInfo info lifetime (tied to the resource itself), determining how IDs can be reused in the registry.

We could put the ResourceInfo into an arc of its own, but that would have to be done for all resources because of all of the generics involved, that's not great. We could also devise another way to decide when to reuse IDs. I would prefer it to stay close to the registry itself instead of being driven by the resource reference counts. Ideally IDs would be freed by the Gloabal::<resource>_drop functions, but for it to work well we need to remove the remaining uses of IDs during command recording otherwise one could drop/re-create a resource used by a command encoder before submit.

@nical
Copy link
Contributor Author

nical commented Jan 26, 2024

Another solution that doesn't require as much refactoring is to make command buffers snatchable. We probably should not use the Snatchable type for this since snatching a resource while anything wgpu-related is happening on another thread is guaranteeing lock contention. But a mutex should suffice since when we are extracting the command buffer we already expect to be the only users.

@Wumpf
Copy link
Member

Wumpf commented Jun 23, 2024

@nical with the other things you landed by now, has the situation changed much on how to fix this?

Snatchability out of the registry makes the most sense to me since this best represents what should be happening (the queue submissing takes over the lifetime of the actual resource, from a hub perspective it's already dead), but then again I'm not 100% sure I've been following correctly

@nical
Copy link
Contributor Author

nical commented Jun 24, 2024

I'm note sure, this is completely paged out of my brain. I'll try to rebase it and see if it breaks.

@cwfitzgerald cwfitzgerald merged commit ddff69b into gfx-rs:trunk Jun 24, 2024
25 checks passed
Wumpf pushed a commit to erichdongubler-mozilla/wgpu that referenced this pull request Jun 24, 2024
* Avoid leaking submitted command encoders

* changelog
ErichDonGubler pushed a commit to erichdongubler-mozilla/wgpu that referenced this pull request Jun 24, 2024
* Avoid leaking submitted command encoders

* changelog
ErichDonGubler pushed a commit to erichdongubler-mozilla/wgpu that referenced this pull request Jun 24, 2024
* Avoid leaking submitted command encoders

* changelog
@nical nical deleted the cmdbug-leak branch June 26, 2024 09:21
@teoxoy
Copy link
Member

teoxoy commented Jul 2, 2024

For tracking purposes: This was unblocked by #5244.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Submitting command encoders caused memory leak
4 participants