executor: rewrite the work-stealing thread pool #1657

carllerche · 2019-10-14T15:07:37Z

This patch is a ground up rewrite of the existing work-stealing thread
pool. The goal is to reduce overhead while simplifying code when
possible.

At a high level, the following architectural changes were made:

The local run queues were switched for bounded circle buffer queues.
Reduce cross-thread synchronization.
Refactor task constructs to use a single allocation and always include
a join handle (Refine Executor trait #887).
Simplify logic around putting workers to sleep and waking them up.

This article goes into details of the implementation and would be helpful reading when reviewing this PR: https://tokio.rs/blog/2019-10-scheduler/

Local run queues

Move away from crossbeam's implementation of the Chase-Lev deque. This
implementation included unnecessary overhead as it supported
capabilities that are not needed for the work-stealing thread pool.
Instead, a fixed size circle buffer is used for the local queue. When
the local queue is full, half of the tasks contained in it are moved to
the global run queue.

Reduce cross-thread synchronization

This is done via many small improvements. Primarily, an upper bound is
placed on the number of concurrent stealers. Limiting the number of
stealers results in lower contention. Secondly, the rate at which
workers are notified and woken up is throttled. This also reduces
contention by preventing many threads from racing to steal work.

Refactor task structure

Now that Tokio is able to target a rust version that supports
std::alloc as well as std::task, the pool is able to optimize how
the task structure is laid out. Now, a single allocation per task is
required and a join handle is always provided enabling the spawner to
retrieve the result of the task (#887).

Simplifying logic

When possible, complexity is reduced in the implementation. This is done
by using locks and other simpler constructs in cold paths. The set of
sleeping workers is now represented as a Mutex<VecDeque<usize>>.
Instead of optimizing access to this structure, we reduce the amount the
pool must access this structure.

Secondly, we have (temporarily) removed threadpool::blocking. This
capability will come back later, but the original implementation was way
more complicated than necessary.

Results

The thread pool benchmarks have improved significantly:

Old thread pool

test chained_spawn ... bench:   2,019,796 ns/iter ( /- 302,168)
test ping_pong     ... bench:   1,279,948 ns/iter ( /- 154,365)
test spawn_many    ... bench:  10,283,608 ns/iter ( /- 1,284,275)
test yield_many    ... bench:  21,450,748 ns/iter ( /- 1,201,337)

New thread pool

test chained_spawn ... bench:     168,854 ns/iter ( /- 8,339)
test ping_pong     ... bench:     562,659 ns/iter ( /- 34,410)
test spawn_many    ... bench:   7,320,737 ns/iter ( /- 264,620)
test yield_many    ... bench:  14,638,563 ns/iter ( /- 1,573,678)

Real-world benchmarks improve significantly as well. This is testing the hyper hello world server using wrk -t1 -c50 -d10:

Old scheduler

Running 10s test @ http://127.0.0.1:3000
  1 threads and 50 connections
  Thread Stats   Avg      Stdev     Max    /- Stdev
    Latency   371.53us   99.05us   1.97ms   60.53%
    Req/Sec   114.61k     8.45k  133.85k    67.00%
  1139307 requests in 10.00s, 95.61MB read
Requests/sec: 113923.19
Transfer/sec:      9.56MB

New scheduler

Running 10s test @ http://127.0.0.1:3000
  1 threads and 50 connections
  Thread Stats   Avg      Stdev     Max    /- Stdev
    Latency   275.05us   69.81us   1.09ms   73.57%
    Req/Sec   153.17k    10.68k  171.51k    71.00%
  1522671 requests in 10.00s, 127.79MB read
Requests/sec: 152258.70
Transfer/sec:     12.78MB

This patch is a ground up rewrite of the existing work-stealing thread pool. The goal is to reduce overhead while simplifying code when possible. At a high level, the following architectural changes were made: - The local run queues were switched for bounded circle buffer queues. - Reduce cross-thread synchronization. - Refactor task constructs to use a single allocation and always include a join handle (#887). - Simplify logic around putting workers to sleep and waking them up. Move away from crossbeam's implementation of the Chase-Lev deque. This implementation included unnecessary overhead as it supported capabilities that are not needed for the work-stealing thread pool. Instead, a fixed size circle buffer is used for the local queue. When the local queue is full, half of the tasks contained in it are moved to the global run queue. This is done via many small improvements. Primarily, an upper bound is placed on the number of concurrent stealers. Limiting the number of stealers results in lower contention. Secondly, the rate at which workers are notified and woken up is throttled. This also reduces contention by preventing many threads from racing to steal work. Now that Tokio is able to target a rust version that supports `std::alloc` as well as `std::task`, the pool is able to optimize how the task structure is laid out. Now, a single allocation per task is required and a join handle is always provided enabling the spawner to retrieve the result of the task (#887). When possible, complexity is reduced in the implementation. This is done by using locks and other simpler constructs in cold paths. The set of sleeping workers is now represented as a `Mutex<VecDeque<usize>>`. Instead of optimizing access to this structure, we reduce the amount the pool must access this structure. Secondly, we have (temporarily) removed `threadpool::blocking`. This capability will come back later, but the original implementation was way more complicated than necessary.

tokio-executor/src/thread_pool/queue/local.rs

tokio-executor/Cargo.toml

tokio-executor/src/thread_pool/queue/local.rs

hawkw

This is great overall and the associated blog post is a really good explanation. I commented on some non-blocking nits.

tokio-executor/src/loom/mod.rs

tokio-executor/src/task/error.rs

tokio-executor/src/task/harness.rs

tokio-executor/src/thread_pool/current.rs

tokio-executor/src/thread_pool/park.rs

tokio-executor/src/thread_pool/queue/global.rs

tokio-executor/src/thread_pool/queue/local.rs

tokio-executor/src/thread_pool/worker.rs

tokio-executor/src/task/list.rs

tokio-executor/src/task/stack.rs

tokio-executor/src/task/waker.rs

tokio-executor/src/thread_pool/park.rs

tokio-executor/src/thread_pool/set.rs

tokio-executor/src/thread_pool/thread_pool.rs

tokio/src/runtime/threadpool/builder.rs

tokio-executor/src/task/harness.rs

seanmonstar · 2019-10-15T19:22:56Z

Can the mod task stuff be optional? Seems like a lot of code if you just want to depend on the traits (or current-thread, though I assume that executor will take advantage of this stuff eventually).

sfackler · 2019-10-15T23:43:07Z

I tried out the branch with some tokio-postgres benchmarks and hit this panic:

thread 'tokio-runtime-worker-0' panicked at 'assertion failed: res.is_complete() || res.is_canceled()', /home/sfackler/.cargo/git/checkouts/tokio-377c595163f99a10/3a941e9/tokio-executor/src/task/harness.rs:182:9
stack backtrace:
   0: backtrace::backtrace::libunwind::trace
             at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.37/src/backtrace/libunwind.rs:88
   1: backtrace::backtrace::trace_unsynchronized
             at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.37/src/backtrace/mod.rs:66
   2: std::sys_common::backtrace::_print_fmt
             at src/libstd/sys_common/backtrace.rs:76
   3: <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt
             at src/libstd/sys_common/backtrace.rs:60
   4: core::fmt::write
             at src/libcore/fmt/mod.rs:1030
   5: std::io::Write::write_fmt
             at src/libstd/io/mod.rs:1412
   6: std::sys_common::backtrace::_print
             at src/libstd/sys_common/backtrace.rs:64
   7: std::sys_common::backtrace::print
             at src/libstd/sys_common/backtrace.rs:49
   8: std::panicking::default_hook::{{closure}}
             at src/libstd/panicking.rs:196
   9: std::panicking::default_hook
             at src/libstd/panicking.rs:210
  10: std::panicking::rust_panic_with_hook
             at src/libstd/panicking.rs:473
  11: std::panicking::begin_panic
  12: tokio_executor::task::harness::Harness<T,S>::drop_task
  13: tokio_executor::task::raw::drop_task
  14: std::thread::local::LocalKey<T>::with
  15: tokio_executor::task::harness::Harness<T,S>::wake_by_val
  16: tokio_executor::task::waker::wake_by_val
  17: <tokio_net::driver::reactor::Inner as core::ops::drop::Drop>::drop
             at /home/sfackler/.cargo/git/checkouts/tokio-377c595163f99a10/3a941e9/tokio-net/src/driver/reactor.rs:510
  18: core::ptr::real_drop_in_place
             at /rustc/fa5c2f3e5724bce07bf1b70020e5745e7b693a57/src/libcore/ptr/mod.rs:175
  19: core::ptr::drop_in_place
             at /rustc/fa5c2f3e5724bce07bf1b70020e5745e7b693a57/src/libcore/ptr/mod.rs:165
  20: alloc::sync::Arc<T>::drop_slow
             at /rustc/fa5c2f3e5724bce07bf1b70020e5745e7b693a57/src/liballoc/sync.rs:703
  21: <alloc::sync::Arc<T> as core::ops::drop::Drop>::drop
             at /rustc/fa5c2f3e5724bce07bf1b70020e5745e7b693a57/src/liballoc/sync.rs:1225
  22: core::ptr::real_drop_in_place
             at /rustc/fa5c2f3e5724bce07bf1b70020e5745e7b693a57/src/libcore/ptr/mod.rs:175
  23: core::ptr::real_drop_in_place
             at /rustc/fa5c2f3e5724bce07bf1b70020e5745e7b693a57/src/libcore/ptr/mod.rs:175
  24: core::ptr::real_drop_in_place
             at /rustc/fa5c2f3e5724bce07bf1b70020e5745e7b693a57/src/libcore/ptr/mod.rs:175
  25: core::ptr::real_drop_in_place
             at /rustc/fa5c2f3e5724bce07bf1b70020e5745e7b693a57/src/libcore/ptr/mod.rs:175
  26: core::ptr::real_drop_in_place
             at /rustc/fa5c2f3e5724bce07bf1b70020e5745e7b693a57/src/libcore/ptr/mod.rs:175
  27: tokio_executor::thread_pool::builder::Builder::build_with_park::{{closure}}
             at /home/sfackler/.cargo/git/checkouts/tokio-377c595163f99a10/3a941e9/tokio-executor/src/thread_pool/builder.rs:192
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

carllerche · 2019-10-16T00:03:38Z

@sfackler can you give me repro steps?

sfackler · 2019-10-16T01:18:08Z

I hit it runing cargo bench with tokio patched to this branch at sfackler/rust-postgres@9d2ec74.

It seems like you can probably run into it by just dropping a runtime with a spawned future still alive though.

carllerche · 2019-10-16T01:30:38Z

@sfackler thanks... looking at your backtrace, it looks like there is a bug when the reactor notifies tasks (which is internal to the thread pool) and dropping... i don't think I thought of testing that case. ❤️ the QA

tokio-executor/src/thread_pool/idle.rs

This is probably not super important, but lets make it technically correct.

Co-Authored-By: Doug Wilson <[email protected]>

Co-Authored-By: Eliza Weisman <[email protected]>

Co-Authored-By: Taiki Endo <[email protected]>

carllerche · 2019-10-17T05:17:56Z

I have reproed @sfackler's reported bug. It is indeed a shutdown bug. That is what I get for not using loom to test the shutdown process :'( Adding a loom shutdown test catches it.

lhecker · 2019-10-17T11:58:48Z

I’m sorry to barge in like that. I‘d just like to cross post what I wrote on Reddit over here where it might be a better fit. 🙂

Nice work!
@zonyitoo and me actually also ported Go‘s runtime over to Rust a long time ago.
Please feel free to take our shabby code as another point of reference. 😄

In particular I‘d like to mention the RandomProcessorOrder struct and fetch_foreign_coroutines method you can find there, the former of which I couldn’t find in this PR.
According to Go’s runtime it better randomizes the sources for work stealing. 🙂

carllerche · 2019-10-17T14:55:20Z

@lhecker thanks for the references. There are definitely still areas to improve on in this PR. I’ll be taking a look. Feel free to leave whatever other feedback you might have.

When shutting down the scheduler, the first step is to close the global (injection) queue. Once this is done, it is possible for a task to be notified from *outside* of a scheduler thread. This will attempt to push the task onto the injection queue but fail. When this happens, the task must be explicitly be shutdown or bad things will happen.

carllerche · 2019-10-17T19:31:30Z

@sfackler I fixed the bug you reported. By adding a loom test, I found some additional bugs related to shutdown. I applied a temporary fix, but it needs some cleanup.

tokio-executor/src/task/core.rs

hawkw · 2019-10-18T18:45:20Z

tokio-executor/src/task/core.rs

+    pub(super) executor: CausalCell<Option<NonNull<S>>>,
+
+    /// Pointer to next task, used for misc task linked lists.
+    pub(crate) queue_next: UnsafeCell<*const Header<S>>,


Should this be changed to an Option<NonNull<Header<S>>> as well, if only for consistency with the owned list?

hawkw · 2019-10-18T18:48:57Z

tokio-executor/src/thread_pool/tests/worker.rs

+        let (pool, mut w, mock_park) = pool!(!2);
+        (pool, w.remove(0), w.remove(0), mock_park)
+    }};
+    (! $n:expr) => {{


TIOLI: I think typically @ is used in macros for "inner" match arms?

hawkw · 2019-10-18T18:49:52Z

tokio-executor/src/thread_pool/tests/queue.rs

+
+        for (_, join) in map.iter_mut() {
+            let mut cx = Context::from_waker(noop_waker_ref());
+            match Pin::new(join).poll(&mut cx) {


Nit/TIOLI: Could be simplified to

if let Ready(n) = Pin::new(join).poll(&mut cx) { num = Some(n.unwrap()); break; }

tokio-executor/src/task/core.rs

Co-Authored-By: Eliza Weisman <[email protected]>

carllerche added 2 commits October 14, 2019 07:55

partial fmt

3a941e9

ben0x539 reviewed Oct 14, 2019

View reviewed changes

tokio-executor/src/thread_pool/queue/local.rs Outdated Show resolved Hide resolved

lovesegfault reviewed Oct 14, 2019

View reviewed changes

tokio-executor/Cargo.toml Outdated Show resolved Hide resolved

carllerche commented Oct 14, 2019

View reviewed changes

tokio-executor/src/thread_pool/queue/local.rs Outdated Show resolved Hide resolved

hawkw reviewed Oct 14, 2019

View reviewed changes

taiki-e reviewed Oct 15, 2019

View reviewed changes

seanmonstar reviewed Oct 15, 2019

View reviewed changes

tokio-executor/src/task/harness.rs Outdated Show resolved Hide resolved

carllerche mentioned this pull request Oct 15, 2019

Expose around worker in Runtime builder #1557

Closed

duog reviewed Oct 16, 2019

View reviewed changes

tokio-executor/src/thread_pool/idle.rs Outdated Show resolved Hide resolved

carllerche and others added 15 commits October 16, 2019 11:37

reorganize tests to avoid having to use path hacks

66ef9f6

fmt

b6cb873

apply PR feedback

13b9bc0

Fix racy test

e30b932

This is probably not super important, but lets make it technically correct.

Update tokio-executor/src/thread_pool/idle.rs

3450bab

Co-Authored-By: Doug Wilson <[email protected]>

tweaks

d4ae6bc

fmt

dc11f66

fix docs

50c8099

try to fix warnings

b969bbf

fix clippy

d6f966d

fmt

8b606e8

Update tokio-executor/src/task/harness.rs

626f5e8

Co-Authored-By: Eliza Weisman <[email protected]>

Update tokio-executor/src/task/harness.rs

412dbbe

Co-Authored-By: Eliza Weisman <[email protected]>

feedback

c502cef

Feedback

d6969db

carllerche and others added 6 commits October 16, 2019 21:05

impl Deref for WakerRef

235d637

Update tokio-executor/src/thread_pool/park.rs

d3bba3e

Co-Authored-By: Taiki Endo <[email protected]>

remove description fn

574fb3a

Update tokio-executor/src/thread_pool/set.rs

a6aad21

Co-Authored-By: Taiki Endo <[email protected]>

more feedback

8dcb90f

bring back comment

5205185

carllerche added 2 commits October 16, 2019 22:27

add failing shutdown test

b4d06b4

fmt

92a14f8

carllerche added 4 commits October 17, 2019 09:59

fix potential deadlock in shutdown

534ef20

temporary fix for use-after-free during shutdown

288d396

fmt

fd1269b

carllerche added 4 commits October 17, 2019 13:51

try a slightly better fix

db19a96

use Box::new for task allocation

803482f

fmt

44c51e8

fix warnings

507b509

seanmonstar approved these changes Oct 18, 2019

View reviewed changes

hawkw reviewed Oct 18, 2019

View reviewed changes

carllerche and others added 3 commits October 18, 2019 21:08

fix memory ordering bug (caught by loom)

8b9c060

Update tokio-executor/src/task/core.rs

60c3cd6

Co-Authored-By: Eliza Weisman <[email protected]>

Update tokio-executor/src/task/core.rs

1c117fb

Co-Authored-By: Eliza Weisman <[email protected]>

carllerche merged commit ed5a94e into master Oct 19, 2019

carllerche deleted the new-scheduler branch October 19, 2019 19:42

jonhoo mentioned this pull request Oct 23, 2019

Add back support for in-place blocking with new scheduler #1681

Merged

Thomasdezeeuw mentioned this pull request Jan 2, 2020

Work-stealing scheduler Thomasdezeeuw/heph#254

Closed

jtong11 mentioned this pull request Oct 15, 2020

parallel queries would run sequential at hyper:0.12.8 hyperium/hyper#2300

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

executor: rewrite the work-stealing thread pool #1657

executor: rewrite the work-stealing thread pool #1657

carllerche commented Oct 14, 2019 •

edited

Loading

hawkw left a comment

seanmonstar commented Oct 15, 2019

sfackler commented Oct 15, 2019

carllerche commented Oct 16, 2019

sfackler commented Oct 16, 2019

carllerche commented Oct 16, 2019

carllerche commented Oct 17, 2019

lhecker commented Oct 17, 2019

carllerche commented Oct 17, 2019

carllerche commented Oct 17, 2019

hawkw Oct 18, 2019

hawkw Oct 18, 2019

hawkw Oct 18, 2019

executor: rewrite the work-stealing thread pool #1657

executor: rewrite the work-stealing thread pool #1657

Conversation

carllerche commented Oct 14, 2019 • edited Loading

Local run queues

Reduce cross-thread synchronization

Refactor task structure

Simplifying logic

Results

hawkw left a comment

Choose a reason for hiding this comment

seanmonstar commented Oct 15, 2019

sfackler commented Oct 15, 2019

carllerche commented Oct 16, 2019

sfackler commented Oct 16, 2019

carllerche commented Oct 16, 2019

carllerche commented Oct 17, 2019

lhecker commented Oct 17, 2019

carllerche commented Oct 17, 2019

carllerche commented Oct 17, 2019

hawkw Oct 18, 2019

Choose a reason for hiding this comment

hawkw Oct 18, 2019

Choose a reason for hiding this comment

hawkw Oct 18, 2019

Choose a reason for hiding this comment

carllerche commented Oct 14, 2019 •

edited

Loading