-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pool: fix infinite loop with --start-cycle-point #5604
Conversation
* Closes cylc#5603 * Fixes an infinite loop which could occur where one or more task sequences terminate before the start point. * This triggered a bug whereby the runahead limit was disabled on startup causing an infinite spawning bug.
self._configure_contact() | ||
await self.configure() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated: Swap the order of these two so that the contact file gets written earlier.
This reduces the window of opportunity for two schedulers to start up for the same workflow. Taking a look at the contact file fields, I can't see anything set during Scheduler.configure
which is used in _configure_contact
so the order should be arbitrary.
seq.get_first_point(self.config.start_point) | ||
for seq in self.config.sequences | ||
} | ||
if point is not None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem was caused by a None
value in this list. Due to the logic lower down, this caused the runahead limit point to be set to None
.
* Write the contact file earlier to minimise the window of opportunity for two schedulers to start up for the same workflow.
@@ -1807,7 1807,11 @@ async def _shutdown(self, reason: BaseException) -> None: | |||
sys.stdout.flush() | |||
sys.stderr.flush() | |||
|
|||
if self.contact_data and self.task_job_mgr: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to the previous, because contact_data
is set a little earlier, this test needed bodging a little as we could get into the situation where the workflow failed to startup, but did configure its contact file triggering remote-tidy to run on shutdown which caused the database to be created when it shouldn't have been.
Test changed to only run when the database is present, which is a better test, as remote-tidy needs database info to function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well that's a horrid bug!
Check List
CONTRIBUTING.md
and added my name as a Code Contributor.setup.cfg
(andconda-environment.yml
if present).CHANGES.md
entry included if this is a change that can affect users?.?.x
branch.