Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add multi draw indirect feature #2315

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

rconde01
Copy link
Contributor

@rconde01 rconde01 commented Nov 16, 2021

This PR adds a multi-draw-indirect feature and is meant to replace #1949

It is rebased over the now merged #2022

Fixes #1354

spec/index.bs Outdated Show resolved Hide resolved
@github-actions
Copy link
Contributor

Previews, as seen when this build job started (b073087):
WebGPU | IDL
WGSL
Explainer

Copy link
Contributor

@kvark kvark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

spec/index.bs Outdated Show resolved Hide resolved
spec/index.bs Outdated Show resolved Hide resolved
spec/index.bs Outdated Show resolved Hide resolved
@AmesingFlank
Copy link

Hey! Any updates on this? Can we expect this feature to be available in v1.0?

@kainino0x
Copy link
Contributor

Since this is an optional feature I don"t think it will be a blocking feature for V1 release. After V1 this will be a living spec, and we"ll be doing incremental additions like this one (there won"t be a wait for a V1.1 or V2).

@kainino0x kainino0x added this to the Polish post-V1 milestone May 3, 2022
@kainino0x
Copy link
Contributor

Briefly discussed in meeting: We"re going to keep this in polish post-v1, won"t be able to spend the time to get it into the spec for V1. Will be soon after. Sorry to folks waiting for this!

@kainino0x kainino0x marked this pull request as draft September 22, 2022 16:16
@kainino0x kainino0x modified the milestones: Polish post-V1, Milestone 2? Aug 15, 2023
@rconde01
Copy link
Contributor Author

@kainino0x We would like to restart this discussion based on your comment above. We"re happy to assist in any investigations required to move this forward.

@kainino0x
Copy link
Contributor

Let me propose this for Milestone 1 and add it to the agenda. Unfortunately we don"t have a meeting this week, but hopefully next week (in the Pacific-timed meeting).

@kainino0x kainino0x modified the milestones: Polish post-V1, Milestone 0, Milestone 1 Oct 24, 2023
@rconde01
Copy link
Contributor Author

Sounds good.

spec/index.bs Outdated Show resolved Hide resolved
spec/index.bs Outdated Show resolved Hide resolved
Draws primitives using an array of parameters read from a {{GPUBuffer}}.
See [[#rendering-operations]] for the detailed specification.

This method is defined if and only if the {{GPUFeatureName/"multi-draw-indirect"}} [=feature=] is enabled.
Copy link
Contributor

@kainino0x kainino0x Oct 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the check I was missing above.

Just an editorial thing but we"ll need to move this to a content-timeline check that throws "TypeError" if the feature isn"t enabled, as WebIDL doesn"t let us dynamically decide whether to define something or not.

spec/index.bs Outdated Show resolved Hide resolved
spec/index.bs Outdated
optional GPUSize64 drawCountOffset = 0);
undefined multiDrawIndexedIndirect(GPUBuffer indirectBuffer, GPUSize64 indirectOffset,
GPUSize32 maxDrawCount,
optional GPUBuffer drawCountBuffer = null,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit for editors: need to check if these GPUBuffers need to be nullable

@kainino0x
Copy link
Contributor

kainino0x commented Oct 24, 2023

If you make revisions on this PR, feel free to not deal with doing a rebase (though you can if you want to). We can deal with it later.

@rconde01 rconde01 force-pushed the add-multi-draw-indirect-feature branch from 4b20359 to a58d222 Compare October 25, 2023 18:02
@rconde01
Copy link
Contributor Author

@kainino0x I think the feedback is addressed. I was able to do the rebase, but some of the editor notes got stomped.

@@ -1828,6 +1828,12 @@ A <dfn dfn>supported limits</dfn> object has a value for every limit defined by
<tr class=row-continuation><td colspan=4>
The maximum value for the arguments of
{{GPUComputePassEncoder/dispatchWorkgroups(workgroupCountX, workgroupCountY, workgroupCountZ)}}.

<tr><td><dfn>maxMultiDrawIndirectCount</dfn>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few thoughts here:

  • Shouldn"t this also apply to the value in the drawCountBuffer?
  • Does this mean that after enabling the feature you still need to request a higher limit or else will be stuck to 1? It"s workable but seems a bit weird.
  • Seems a bit weird that this could be 1 when the feature is not enabled. In fact it seems weird the limit exists at all when the feature isn"t enabled...I don"t think there precedent for feature related limits though.
  • @kvark had suggested in a previous comment to " lift the limit entirely...I.e. it becomes limited in the same way as maximum index count is limited." It"s not clear to me how "maximum index count is limited", other than it"s tied to the max buffer size. Maybe the feature should only be available when the implementation max count is >= maxStorageBufferBindingSize and then this limit goes away? Not sure how reasonable that is.

Copy link
Contributor Author

@rconde01 rconde01 Oct 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn"t this also apply to the value in the drawCountBuffer?

Realized this is already covered transitively since drawCountBuffer is already limited by maxDrawCount. Nm.

Copy link
Contributor

@kainino0x kainino0x Jul 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Sirtsu55 is working on this feature in Dawn and we got to talking about the limit. I am not sure we need one - if it"s guaranteed to be large on all systems then we could just hardcode a limit, or derive a limit from maxBufferSize.

  • Vulkan: has a limit but it"s always either 1 or huge, so when the feature is available we could just have a hardcoded limit as high as 2**30.
  • D3D12: not sure yet, currently investingating.
  • Metal: There is already a maxBufferSize which imposes some limit. On weaker devices like old iPhones (the ones where maxBufferSize is 256MiB) there"s a good chance this is the limiting factor. However the limiting factor wouldn"t be the size on the indirect buffer (256MiB/16B=16.6M draws or 256MiB/20B=13.4M indexed draws) but the size of the indirect command buffer (ICB) that we generate internally in our draw call validation/rerecording shader. There is a maxCommandCount when you allocate an ICB, but we haven"t found any documented limit on how large this can be. We don"t know what the size of a draw command is so we can"t seem to derive it from maxBufferLength - maybe in practice it"s something like maxBufferLength / 32 bytes (256MiB/32B = 8.3M). We will probably try probing the Metal validation layer, but @mwyrzykowski @litherum would you be able to tell us how to determine this limit - maybe what the validation layer checks and/or whether it would be large enough on all target systems (#1069) that we can hardcode it?

Copy link

@mwyrzykowski mwyrzykowski Jul 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kainino0x The metal validation layer does not appear to validate maxCommandCount specifically, though you may run into validation errors due to the maxBufferLength.

It appears we need the following to hold device.maxBufferLength <= 40 + maxCommandCount * ((2 + 2 * desc.maxVertexBufferBindCount + desc.maxFragmentBufferBindCount) * 8), where desc is the ICB descriptor.

I will need to check if that is strictly for validation or applies in general, we could start there as a limit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mwyrzykowski thanks! Since multidraw will always inherit vertex/fragment buffers, those should be 0, simplifying to 40 + maxCommandCount * 2 * 8. (Also, did you mean the reverse, ... <= maxBufferLength?)

Just want to verify that 16 bytes per command is correct. That is too small for an indexed indirect call (20 bytes) so I"m guessing it is 2 pointers, one which points to the indexed indirect parameters allocated in some other buffer. If so, does the size of that buffer impose a limit? Or will it split into multiple buffers if it doesn"t fit?

I"m also concerned that there will be three huge allocations, and we"ll run out of memory anyway: the user"s draw params, the ICB full of pointers, and Metal"s internal allocation(s) that hold draw params.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great question and I don"t think we have an answer for it. Maybe it would be possible to expand multidraw calls in bundles out, into allocation space for many single-draw calls, which we can write to later? But that would leave (maxDrawCount - actual draw count) of unused draw calls in the bundle which is probably very inefficient.

Maybe we have to say multidraw is not supported in bundles to start.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would certainly be easier, limiting this functionality to GPURenderPassEncoder

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kainino0x What should we do about the maxMultiDrawIndirectCount limit?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to put it out there:

From my investigation of D3D12 in WebGPU the limiting factor is the buffer size. I tried counts up to 2^26 in a native D3D12 app, it does not work. Vulkan has a property and Metal is untested.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don"t know, I"d propose giving a hardcoded value of 10M or something to punt again on adding limits that are part of features (#4613). Unless we think that will be too small.

Of course we need to test that, and we also need to check that setting maxDrawCount=10M and having a small draw count in the drawCountBuffer can perform passably well. (It should, though in Dawn we need to fix https://crbug.com/367394543.)


undefined multiDrawIndirect(GPUBuffer indirectBuffer, GPUSize64 indirectOffset,
GPUSize32 maxDrawCount,
optional GPUBuffer drawCountBuffer = null,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from @kainino0x: nit for editors: need to check if these GPUBuffers need to be nullable

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To match feature parity with D3D12 and Vulkan the drawCountBuffer should be nullable, in which case maxDrawCount becomes the draw count.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the question was about optional vs nullable

Copy link
Contributor

@kainino0x kainino0x Jul 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To answer this old question I think we should make it optional and not nullable. That should still allow null.

Suggested change
optional GPUBuffer drawCountBuffer = null,
optional GPUBuffer drawCountBuffer,

Allowing any of these (I believe) and combinations thereof:

pass.multiDrawIndirect(indirectBuffer, 0, 100);
pass.multiDrawIndirect(indirectBuffer, 0, 100, null);
pass.multiDrawIndirect(indirectBuffer, 0, 100, null, 0);
pass.multiDrawIndirect(indirectBuffer, 0, 100, null, null);
pass.multiDrawIndirect(indirectBuffer, 0, 100, undefined);
pass.multiDrawIndirect(indirectBuffer, 0, 100, undefined, 0);
pass.multiDrawIndirect(indirectBuffer, 0, 100, undefined, undefined);

The <dfn dfn for=>indirect multiDraw parameters</dfn> encoded in the |indirectBuffer| must be zero or more
tightly packed blocks of **four 32-bit unsigned integer values (16 bytes total)**, given in the same
order as the arguments for {{GPURenderCommandsMixin/draw()}}. For example:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should call out here that indirect-first-instance is not required to use firstInstance here, since it is not obvious.

The <dfn dfn for=>indirect multiDrawIndexed parameters</dfn> encoded in the |indirectBuffer| must be zero or more
tightly packed blocks of **five 32-bit unsigned integer values (20 bytes total)**, given in
the same order as the arguments for {{GPURenderCommandsMixin/drawIndexed()}}. For example:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

@kainino0x
Copy link
Contributor

^ just dropping some editorial notes for when we pick up this PR again

- |drawCountOffset| + sizeof([=indirect multiDraw count=]) &le;
|drawCountBuffer|.{{GPUBuffer/[[size]]}}.
- |drawCountOffset| is a multiple of 4.
</div>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When implementing this for D3D12, the driver / validation layer checks if maxDrawCount exceeds the range of the indirect buffer. Specifically it returns E_INVALIDARG when closing the command list. We should add this to the validation. Something along the lines of: indirectOffset + sizeof(indirectDrawParams) * maxDrawCount should not be out of bounds of indirectBuffer

@kainino0x
Copy link
Contributor

@mwyrzykowski @litherum question buried above: #2315 (comment)

- |drawCountOffset| is a multiple of 4.
</div>
1. Add |indirectBuffer| and |drawCountBuffer| (if given) to the [=usage scope=] as [=internal usage/input=].
</div>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
</div>
`1. Increment |this|.{{GPURenderCommandsMixin/[[drawCount]]}} by |maxDrawCount|.`
</div>

We won"t know the exact number of draw calls issued but maxDrawCount is guaranteed to be the maximum so drawCount should be incremented by it.

@kainino0x
Copy link
Contributor

kainino0x commented Sep 24, 2024

Posted an update on the issue #1354:

Chrome Canary should have chromium-experimental-multi-draw-indirect on supported platforms with chrome://flags#enable-unsafe-webgpu enabled.

For anyone who is interested, please try it out and report back! For known Chromium issues, see: https://crbug.com/369246557/dependencies (and please file any bugs you find at https://crbug.com/dawn/new)

If you"d like to help move this feature forward, the most important way to contribute would be by writing conformance tests; please chime in here: gpuweb/cts#3961

- |drawCountBuffer|.{{GPUBuffer/[[usage]]}} contains {{GPUBufferUsage/INDIRECT}}.
- |drawCountOffset| + sizeof([=indirect multiDraw count=]) &le;
|drawCountBuffer|.{{GPUBuffer/[[size]]}}.
- |drawCountOffset| is a multiple of 4.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to allow any value of drawCountOffset if drawCountBuffer is not set?

Copy link
Contributor

@kainino0x kainino0x Sep 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably just keep the validation, but we could also consider using overloads instead so that we can never have a drawCountOffset when there is no drawCountBuffer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don"t know what we"ll do here but two assorted notes:

  • setVertexBuffer still validates the buffer offset when buffer is null
  • I like the overload solution best but we should think about what it will look like in languages without overloads (C webgpu.h). I think we currently only have one overloaded method in the API (setBindGroup) and that overload doesn"t apply to C at all.
    • Two functions? Merge into one function where if drawCountBuffer is null, drawCountOffset {is ignored,must be 0,etc}

1. If any of the following conditions are unsatisfied, make |this| [=invalid=] and stop.
<div class=validusage>
- It is [$valid to draw$] with |this|.
- |indirectBuffer| is [$valid to use with$] |this|.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Opening a thread here because the threads on this issue still track all the things we need to do before landing this feature in the spec.)

According to #1354 (comment) there may be limitations on what can be done on a command buffer in Metal if supportIndirectCommandBuffers is set to true. If so we may need to account for them in the spec. When we get a chance we"ll look into the issue/repro on the Chromium issue and see if we think it needs to be accounted for in the spec. (Or maybe someone can find out whether wgpu has this problem and has already found a limitation or workaround.)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kainino0x WebKit always sets supportIndirectCommandBuffers to YES for all PSOs and we have no such issue problem running three.js"s examples - https://github.com/search?q=repo:WebKit/WebKit supportIndirectCommandBuffers&type=code

I think it is because our implementation is based on Argument Buffers, if Chrome is planning to adopt that then that would be an option without changing the specification.

But setting supportIndirectCommandBuffers on the PSO does have a performance penalty, so I wouldn"t rule out a spec change if we don"t want to always enable this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That"s super useful info, thank you!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One such option would be extending GPUShaderModuleCompilationHint to indicate whether or not an entry point from a given shader module will be used with MTLIndirectCommandBuffer

If we want to explore that route, it would be nice to apply this to GPURenderBundle as well. I.e., if you set the flag to 0 indicating you will not use ICBs with a given PSO then you can not call executeBundles(...) from the RenderPassEncoder which uses that given entry point.

Then at GPURenderPipelineCreation, an implementation can lookup if the given entry point needs to set supportIndirectCommandBuffers to YES or not and potentially impact codegen as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api WebGPU API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DrawIndirectCount
8 participants