Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

custom outputs: output specific retry delays #5652

Open
oliver-sanders opened this issue Jul 27, 2023 · 3 comments
Open

custom outputs: output specific retry delays #5652

oliver-sanders opened this issue Jul 27, 2023 · 3 comments
Milestone

Comments

@oliver-sanders
Copy link
Member

oliver-sanders commented Jul 27, 2023

The optional outputs extension proposal opens up new opportunities for task outputs.

E.G. Here's an example which uses the new completion expression to tolerate a particular type of error:

   [scheduling]
       initial cycle point = 2000
       [[graph]]
           P1Y = """
               get_data:data_no_ready? => get_fallback_data
               get_data? | get_fallback_data => do_something
           """
      
   [runtime]
       [[get_data]]
           # if the "wget" command fails, send the exit code back
           # using "cylc message"
           script = wget "$URL/$CYLC_TASK_CYCLE_POINT" || cylc message -- $?
      
           # allow this task to fail, but only if the error
           # was "data_not_ready"
           completion = succeeded or (failed and data_not_ready)
      
           # turn particular exit codes into custom outputs
           # (these can then be used in the graph)
           [[[outputs]]]
               network_failure = 4
               authentication_failure = 6
               data_not_ready = 8

In this example, the task get_data will still fail, but the failure will be tolerated (by optional outputs) because the completion condition permits failure for this case.

For this example it would be reasonable for the user to want to configure retries differently for each case e.g:

  • :network_failure -> retry delays = PT5M - retry every 5 mins until the network comes back
  • :authentication_failure -> retry delays = - no point retrying
  • :data_not_ready -> retry delays = 1 * PT10M - give it 10 mins, retry once, then give in

To support this we could consider something like:

[runtime]
    [[get_data]]
        [[[retry delays]]]
            # use the first matching delay
            network_failure = PT5M
            data_not_ready = 1 * PT10M
            failed =
   [[do_something]]
       execution retry delays = PT10M  # shorthand for "[retry delays]failed = PT10M"
       submission retry delays   # shorthand for "[retry delays]submit-failed = PT10M"
@oliver-sanders oliver-sanders added the question Flag this as a question for the next Cylc project meeting. label Jul 27, 2023
@oliver-sanders oliver-sanders added this to the cylc-8.x milestone Jul 27, 2023
@hjoliver
Copy link
Member

A good idea!

@oliver-sanders oliver-sanders removed the question Flag this as a question for the next Cylc project meeting. label Jul 28, 2023
@oliver-sanders
Copy link
Member Author

Skimming through the [runtime] section to see if any other configs might benefit from the same treatment...

[events]handler events would make sense, turns out we already support this, added docs for this - cylc/cylc-doc#632

@ColemanTom
Copy link
Contributor

I've been wanting this functionality for a long time, so yes, please.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants