A few threaded scheduler fixups #8143

jcrist · 2021-09-14T18:44:17Z

Previously there was a bug where the number of threads used by the
threaded scheduler was different if dask.compute was called in the
main thread vs in a background thread. While fixing this, I also noticed
that the logic around maintaining a cache of executors could be done in
a simpler way that avoided the need for a lock or active management (we
could instead rely on GC behavior). This patch includes fixes for both.

Previously there was a bug where the number of threads used by the threaded scheduler was different if `dask.compute` was called in the main thread vs in a background thread. While fixing this, I also noticed that the logic around maintaining a cache of executors could be done in a simpler way that avoided the need for a lock or active management (we could instead rely on GC behavior). This patch includes fixes for both.

jcrist · 2021-09-14T18:47:43Z

dask/threaded.py

+    if not hasattr(_EXECUTORS, "pool"):
+        _EXECUTORS.pool = _ExecutorPool()
+    if not num_workers:  # treat both 0 and None the same
+        # TODO: if num_workers is 1 should we still use more threads?


Right now we default to num_workers == CPU_COUNT. Do we want to set a minimum threshold here? This came up in a binder session where CPU_COUNT was 1, so the threaded scheduler wasn't resulting in any parallelism on a simple time.sleep example.

multiprocessing.pool.ThreadPool defaults to os.cpu_count(), but concurrent.futures.ThreadPoolExecutor does set a min default of cpu_count 4.

I was wondering since we are using the concurrent.futures.ThreadPoolExecutor shouldn't we try to be consistent with their defaults?
Checking the docs when max_workers=None since version 3.8 it sets the default to min(32, os.cpu_count() 4) and when max_workers=0 it raises a value error, instead of treating it same as None.

See: https://github.com/python/cpython/blob/9ccdc90488302b212bd3405d10dc5c22052e9b4c/Lib/concurrent/futures/thread.py#L128-L138

Note we should still be using the custom dask.system.cpu_count function rather than os.cpu_count directly, as it handles things like cpu affinity and cgroups.

IMHO using CPU_COUNT as a default still makes sense.

Likely the instances Binder uses lack multiple cores (and likely live in a multitenant deployment). If there is a better value to use there, it would make sense to document in notebooks or other docs so users know how to override it.

We could potentially check an environment variable and use that for CPU_COUNT (if Binder folks would like to override this generally). Have done similar things with conda-build and CI in conda-forge, which has worked well.

One other thought on the use of a single thread in the multithreaded case, we could issue a warning so the user is aware. This likely requires some thought to ensure the warning is not overly noise (or entirely missed).

This is already settable with an environment variable (DASK_NUM_WORKERS), but not every user knows that.

I don't think we need to match the heuristics of other python threadpool-like-things, I was mainly wondering if having a threaded scheduler that may end up using only 1 thread by default on some systems would be confusing to new users. But since dask is mainly used for compute-heavy work (rather than IO heavy work), using more threads > cores outside of a demo context makes less sense. Fine to keep things as is.

Ah right. Good point.

Yeah I think it is ok to leave as-is. We could maybe have a warning, but we could also handle that as a separate issue ( #8152 ).

jcrist · 2021-09-14T18:48:12Z

cc @ncclementi

ncclementi · 2021-09-14T19:42:57Z

We also seem to be having problems with the test_threaded_within_thread not passing on windows. I wonder if it's connected to how the get works in here

dask/dask/tests/test_threaded.py

Line 83 in f79a400

result = get({"x": (lambda: i,)}, "x", num_workers=2)

jsignell · 2022-03-10T19:12:39Z

This one has slipped through the cracks. Is there something to salvage?

jakirkham · 2022-03-11T21:17:12Z

Would think so as well. Generally this seemed reasonable.

Think it had CI failures (potentially unrelated?) at the time. The PR has also fallen out-of-date. Maybe fixing conflicts and merging with main to get the latest CI status would be a good first (maybe only) step?

jsignell · 2022-03-15T14:38:14Z

I can take on a cleanup effort @jcrist if that's ok with you.

dask/threaded.py

Co-authored-by: jakirkham <[email protected]>

jakirkham · 2022-04-01T16:11:36Z

Seeing this Windows test failure on CI:

    def test_threaded_within_thread():
        L = []
    
        def f(i):
            result = get({"x": (lambda: i,)}, "x", num_workers=2)
            L.append(result)
    
        before = threading.active_count()
    
        for i in range(20):
            t = threading.Thread(target=f, args=(1,))
            t.daemon = True
            t.start()
            t.join()
            assert L == [1]
            del L[:]
    
        start = time()  # wait for most threads to join
        while threading.active_count() > before   10:
            sleep(0.01)
>           assert time() < start   5
E           assert 1648825023.2966466 < (1648825018.2903678   5)
E               where 1648825023.2966466 = time()

dask\tests\test_threaded.py:144: AssertionError

Similar failures show up on all Windows CI runs. Maybe we should just bump 5 to 6 or similar?

jcrist · 2022-04-01T16:17:34Z

Maybe we should just bump 5 to 6 or similar?

No, this is likely an actual windows-specific bug (in the new code), not something timing specific.

jcrist commented Sep 14, 2021

View reviewed changes

github-actions bot added the needs attention It's been a while since this was pushed on. Needs attention from the owner or a maintainer. label Oct 25, 2021

Merge main

ad4aec6

jakirkham reviewed Mar 23, 2022

View reviewed changes

dask/threaded.py Outdated Show resolved Hide resolved

Fix typo

69a2528

Co-authored-by: jakirkham <[email protected]>

github-actions bot removed the needs attention It's been a while since this was pushed on. Needs attention from the owner or a maintainer. label Sep 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A few threaded scheduler fixups #8143

A few threaded scheduler fixups #8143

jcrist commented Sep 14, 2021

jcrist Sep 14, 2021

ncclementi Sep 14, 2021

dhirschfeld Sep 14, 2021

jakirkham Sep 14, 2021

jcrist Sep 15, 2021

jakirkham Sep 15, 2021

jcrist commented Sep 14, 2021

ncclementi commented Sep 14, 2021

jsignell commented Mar 10, 2022

jakirkham commented Mar 11, 2022

jsignell commented Mar 15, 2022

jakirkham commented Apr 1, 2022 •

edited

Loading

jcrist commented Apr 1, 2022

A few threaded scheduler fixups #8143

Are you sure you want to change the base?

A few threaded scheduler fixups #8143

Conversation

jcrist commented Sep 14, 2021

jcrist Sep 14, 2021

Choose a reason for hiding this comment

ncclementi Sep 14, 2021

Choose a reason for hiding this comment

dhirschfeld Sep 14, 2021

Choose a reason for hiding this comment

jakirkham Sep 14, 2021

Choose a reason for hiding this comment

jcrist Sep 15, 2021

Choose a reason for hiding this comment

jakirkham Sep 15, 2021

Choose a reason for hiding this comment

jcrist commented Sep 14, 2021

ncclementi commented Sep 14, 2021

jsignell commented Mar 10, 2022

jakirkham commented Mar 11, 2022

jsignell commented Mar 15, 2022

jakirkham commented Apr 1, 2022 • edited Loading

jcrist commented Apr 1, 2022

jakirkham commented Apr 1, 2022 •

edited

Loading