You know how async
methods that await
something incomplete end up creating a few objects, right? There's
the boxed state machine, an Action
that moves it forward, a Task[<T>]
, etc - right?
Well... what about if there just wasn't?
And what if all you had to do was change your async ValueTask<int>
method to async PooledValueTask<int>
?
And I hear you; you're saying "but I can't change the public API!". But what if a PooledValueTask<int>
really was
a ValueTask<int>
? So you can just cheat:
public ValueTask<int> DoTheThing() // the outer method is not async
{
return ReallyDoTheThing(this);
static async PooledValueTask<int> ReallyDoTheThing(SomeType obj)
{
... await ...
// (use obj.* instead of this.*)
... return ...
}
}
(the use of a static
local function here avoids a <>c__DisplayClass
wrapper from how the local-function capture context is implemented by the compiler)
And how about if maybe just maybe in the future it could be (if this happens) just:
[SomeKindOfAttribute] // <=== this is the only change
public async ValueTask<int> DoTheThing()
{
// no changes here at all
}
(although note that in some cases it can work better with the static
trick, as above)
Would that be awesome? Because that's what this is!
The PooledValueTask[<T>]
etc exist mostly to define a custom builder. The builder in this library uses aggressive pooling of classes
that replace the boxed approach used by default; we recycle them when the state machine completes.
It also makes use of the IValueTaskSource[<T>]
API to allow incomplete operations to be represented without a Task[<T>]
, but with a custom backer.
And we pool that too, recycling it when the task is awaited. The only downside: you can't await
the same result twice now, because
once you've awaited it the first time, it has gone. A cycling token is used to make sure you can't accidentally read the incorrect
values after the result has been awaited.
We can even do this for Task[<T>]
, except here we can only avoid the boxed state machine; hence PooledTask[<T>]
exists too. No custom backing in this case, though, since a Task[<T>]
will
need to be allocated (except for Task.CompletedTask
, which we special-case).
Based on an operation that uses Task.Yield()
to ensure that the operations are incomplete; ".NET" means the inbuilt out-of-the box implementation; "Pooled" means the implementation from this library.
In particular, notice:
- zero allocations for
PooledValueTask[<T>]
vsValueTask[<T>]
(on .NET Core; significantly reduced on .NET Framework) - reduced allocations for
PooledTask[<T>]
vsTask[<T>]
- no performance degredation; just lower allocations
| Method | Job | Runtime | Categories | Mean | Error | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
|------- |----- |-------- |------------- |---------:|----------:|----------:|-------:|-------:|-------:|----------:|
| .NET | Clr | Clr | Task<T> | 2.159 us | 0.0427 us | 0.0474 us | 0.0508 | 0.0039 | - | 344 B |
| Pooled | Clr | Clr | Task<T> | 2.037 us | 0.0246 us | 0.0230 us | 0.0273 | 0.0039 | - | 182 B |
| .NET | Core | Core | Task<T> | 1.397 us | 0.0024 us | 0.0022 us | 0.0176 | - | - | 120 B |
| Pooled | Core | Core | Task<T> | 1.349 us | 0.0058 us | 0.0054 us | 0.0098 | - | - | 72 B |
| | | | | | | | | | | |
| .NET | Clr | Clr | Task | 2.065 us | 0.0200 us | 0.0167 us | 0.0508 | 0.0039 | - | 336 B |
| Pooled | Clr | Clr | Task | 1.979 us | 0.0179 us | 0.0167 us | 0.0273 | 0.0039 | - | 182 B |
| .NET | Core | Core | Task | 1.390 us | 0.0159 us | 0.0149 us | 0.0176 | - | - | 112 B |
| Pooled | Core | Core | Task | 1.361 us | 0.0055 us | 0.0051 us | 0.0098 | - | - | 72 B |
| | | | | | | | | | | |
| .NET | Clr | Clr | ValueTask<T> | 2.087 us | 0.0403 us | 0.0431 us | 0.0547 | 0.0078 | 0.0039 | 352 B |
| Pooled | Clr | Clr | ValueTask<T> | 1.924 us | 0.0248 us | 0.0220 us | 0.0137 | 0.0020 | - | 100 B |
| .NET | Core | Core | ValueTask<T> | 1.405 us | 0.0078 us | 0.0073 us | 0.0195 | - | - | 128 B |
| Pooled | Core | Core | ValueTask<T> | 1.374 us | 0.0116 us | 0.0109 us | - | - | - | - |
| | | | | | | | | | | |
| .NET | Clr | Clr | ValueTask | 2.056 us | 0.0206 us | 0.0183 us | 0.0508 | 0.0039 | - | 344 B |
| Pooled | Clr | Clr | ValueTask | 1.948 us | 0.0388 us | 0.0416 us | 0.0137 | 0.0020 | - | 100 B |
| .NET | Core | Core | ValueTask | 1.408 us | 0.0140 us | 0.0117 us | 0.0176 | - | - | 120 B |
| Pooled | Core | Core | ValueTask | 1.366 us | 0.0039 us | 0.0034 us | - | - | - | - |
Note that most of the remaining allocations are actually the work-queue internals of Task.Yield()
(i.e. how
ThreadPool.QueueUserWorkItem
works) - we've removed virtually all of the unnecessary overheads that came from the
async
machinery. Most real-world scenarios aren't using Task.Yield()
- they are waiting on external data, etc - so
they won't see these. Plus they are effectively zero on .NET Core 3.
The tests do the exact same thing; the only thing that changes is the return type, i.e. whether it is
async Task<int>
, async ValueTask<int>
, async PooledTask<int>
or async PooledValueTask<int>
.
All of them have the same threading/execution-context/sync-context semantics; there's no cheating going on.