You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Making individual components of the main-loop asynchronous isn't actually going to be a meaningful improvement. It would allow us to run certain "chunks" of the main loop asynchronously (e.g. health checking) but not the main-loop as a whole. The IO component of these functions is very small so there is little to gain.
The real benefit to us of making the main loop asynchronous lies elsewhere.
Grandiose Long-Term Vision:
The real benefit to be gained from asynchronicity is breaking the main-loop lock-step allowing the Scheduler to become event driven and removing the sleep() call.
Breaks the "global" lock-step of the main-loop.
The Scheduler becomes responsive on the time-scale of individual coroutines.
Allows code to become event-driven.
We assimilate the main-loop into event-driven coroutines. There is no while true, do something, sleep loop. There is no sleep statement at all.
The scheduler never sleeps when there is work to be done.
The scheduler always sleeps unless there is work to be done.
Monolythic updates (e.g. task pool iteration) become fast event orientated operations which only care about a single update.
The pathway between cause and effect is minimised providing a massive boost to responsiveness.
The main-loop functionality is broken down into small, easy to write, easy to read coroutines which do one thing and do it well.
We can unit-test the hell out of coroutines and get coverage way up.
We can simulate impossible to re-produce bugs in unit-tests.
We can use integration tests with small groups of coroutines to eliminate most of the functional test battery.
How It Works:
We write a collection of coroutines to be run in the place of the imperative main loop.
A coroutine can be a producer and/or consumer of events. For example the task pool logic is a consumer of messages (from the server) and a producer of task events.
async def task_pool(messages, task_events):
while True:
# yield control to other coroutines until a message arrives
await message = messages.get()
# do something
# yield control to other coroutines to process this new event (and other stuff)
await task_events.push(_)
Until a coroutine hits an await statement it is synchronous (i.e. blocking) so operations where consistency is important remain safe.
At a very high-level the main coroutines might look something like this:
(Note: Trigger is ab abstraction of task-triggers, message-triggers, ext-triggers and x-triggers)
Queues Processes And Threads:
The observant may have noticed in the above diagram two coroutines are both consuming items from the "task events" queue - which wouldn't work. This is because I'm using queue in the loosest-possible sense. Really coroutines desire a publisher-subscriber interface, if only we had a system in Cylc for this pattern ...oh wait ZMQ.
ZMQ can be used with in-proc communication to serve as a PUB-SUB queueing system for our coroutines.
A very interesting side-effect is that once a coroutine is implemented in this way it could be run as an asynchronous coroutine using asyncio, however it can be very easily run in its own thread, or its own process, or even a remote process. A free benefit of the implementation with interesting potential, e.g. multi-processing speedup for large busy workflows, remote execution of xtriggers, etc.
How We Get There:
I can't tell yet how difficult this will be, it could turn out to be quite simple, many of the queues are already in place.
It doesn't all have to be done at once, we can mode code into "top-level" coroutines which run out of lock-step with the main loop (but in the same thread) one function at a time. Here is a suggested pathway:
Proof of concept
Migrate main loop plugins to a top-level coroutine running out of lock-step with the main loop.
With this is mind, it's not necessary to use PUB/SUB (although sufficient), blocking bits like socket.recv can just wait in a separate thread/process for the send (using REQ/RES, or ROUTER/DEALER), or loop with async recv no-wait option (depending on the pattern).
Supersedes #3495 (which was the wrong way of looking at the problem - my bad)
Closely related to #3304
Related to #3497, #2123
The Current Situation:
At the moment the Scheduler main loop is a monolithic function:
Some things aren't in lock-step with the main loop:
We bridge this gap using queues, at present we have the following queues:
commands
- For commands (eg. user commands) received by the server.ext_triggers
- For old-style "ext" triggers.message
- For task-triggers and message-triggers.SubProcPool.queuings/runnings
- For queued/running subprocesses.Why Not #3495:
Making individual components of the main-loop asynchronous isn't actually going to be a meaningful improvement. It would allow us to run certain "chunks" of the main loop asynchronously (e.g. health checking) but not the main-loop as a whole. The IO component of these functions is very small so there is little to gain.
The real benefit to us of making the main loop asynchronous lies elsewhere.
Grandiose Long-Term Vision:
The real benefit to be gained from asynchronicity is breaking the main-loop lock-step allowing the Scheduler to become event driven and removing the
sleep()
call.We assimilate the main-loop into event-driven coroutines. There is no while true, do something, sleep loop. There is no sleep statement at all.
How It Works:
We write a collection of coroutines to be run in the place of the imperative main loop.
A coroutine can be a producer and/or consumer of events. For example the task pool logic is a consumer of messages (from the server) and a producer of task events.
Until a coroutine hits an
await
statement it is synchronous (i.e. blocking) so operations where consistency is important remain safe.At a very high-level the main coroutines might look something like this:
(Note: Trigger is ab abstraction of task-triggers, message-triggers, ext-triggers and x-triggers)
Queues Processes And Threads:
The observant may have noticed in the above diagram two coroutines are both consuming items from the "task events" queue - which wouldn't work. This is because I'm using queue in the loosest-possible sense. Really coroutines desire a publisher-subscriber interface, if only we had a system in Cylc for this pattern ...oh wait ZMQ.
ZMQ can be used with in-proc communication to serve as a PUB-SUB queueing system for our coroutines.
A very interesting side-effect is that once a coroutine is implemented in this way it could be run as an asynchronous coroutine using asyncio, however it can be very easily run in its own thread, or its own process, or even a remote process. A free benefit of the implementation with interesting potential, e.g. multi-processing speedup for large busy workflows, remote execution of xtriggers, etc.
How We Get There:
I can't tell yet how difficult this will be, it could turn out to be quite simple, many of the queues are already in place.
It doesn't all have to be done at once, we can mode code into "top-level" coroutines which run out of lock-step with the main loop (but in the same thread) one function at a time. Here is a suggested pathway:
Proof of concept
XTriggers - xtriggers: re-implement as async functions #3497
Workflow Commands
ZMQ
Task Pool
The Rest
I would tentatively suggest that we should aim to get 1, 2 and 3 into Cylc8 as both are on the pathway in other ways.
Beyond that the rest relates to event-driven scheduling which facilitates an efficient and responsive spawn-on-demand solution.
Hurdles:
The text was updated successfully, but these errors were encountered: