Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log dir tidy #4836

Merged
merged 26 commits into from
Jul 1, 2022
Merged

Log dir tidy #4836

merged 26 commits into from
Jul 1, 2022

Conversation

datamel
Copy link
Contributor

@datamel datamel commented Apr 25, 2022

These changes close #4802

Log directory renaming:

  • log/flow-config => log/config
  • log/workflow => log/scheduler

New Directory for remote install logs: log/remote-install/

Filenames for remote file installation are now on an install-target basis with filename format: <timestamp>-<start/restart/reload>-<install-target>.log

Filename format change for scheduler log:

<log number>-<start/restart>-<start_number>.log
(log/scheduler/log remains a symlink to the current log)

Filename format change for config log:

After a lengthy discussion UK end,
<log number>-<start/restart/reload>-<start_number>.log
The reload start number should correspond to the current start number when the reload took place.

Requirements check-list

  • I have read CONTRIBUTING.md and added my name as a Code Contributor.
  • Contains logically grouped changes (else tidy your branch by rebase).
  • Does not contain off-topic changes (use other PRs for other changes).
  • Applied any dependency changes to both setup.cfg and conda-environment.yml.
  • Appropriate tests are included (unit and/or functional).
  • Appropriate change log entry included.
  • (master branch) I have opened a documentation PR at log tidy changes cylc-doc#463.

@datamel datamel marked this pull request as draft April 25, 2022 14:10
@datamel datamel self-assigned this Apr 26, 2022
@datamel datamel force-pushed the log-dir-tidy branch 5 times, most recently from 797686c to bddc613 Compare April 27, 2022 12:01
@datamel datamel added this to the cylc-8.0rc4 milestone Apr 28, 2022
@datamel datamel mentioned this pull request Apr 28, 2022
1 task
@datamel datamel marked this pull request as ready for review April 28, 2022 11:26
Copy link
Member

@hjoliver hjoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a few minor suggestions.

CHANGES.md Outdated Show resolved Hide resolved
cylc/flow/pathutil.py Outdated Show resolved Hide resolved
cylc/flow/loggingutil.py Outdated Show resolved Hide resolved
cylc/flow/scheduler_cli.py Outdated Show resolved Hide resolved
cylc/flow/scheduler_cli.py Outdated Show resolved Hide resolved
@wxtim
Copy link
Member

wxtim commented May 3, 2022

Adding a pointless comment so that I get a notification when this gets merged, so that I can start work on cylc-review.

@oliver-sanders
Copy link
Member

Adding a pointless comment so that I get a notification

BTW you can [un]subscribe to an issue/pr from the pane on the right hand side (under milestone), especially useful for tracking issues in other projects.

Copy link
Member

@oliver-sanders oliver-sanders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, one small efficiency comment:

cylc/flow/loggingutil.py Outdated Show resolved Hide resolved
Copy link
Member

@oliver-sanders oliver-sanders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

I think I found one issue in testing, cylc cat-log has a feature for setting the log rotation number which I think needs to be updated to the new filename pattern?:

cylc cat-log <workflow> -r 1

Reviewing I've had a couple of ideas (whoops):

1) Use log number in log filenames

Here is the log directory for a workflow I have run, then consequently restarted, the log files have "rolled over" three times:

`-- scheduler
    |-- 20220509T111659 01-start.log
    |-- 20220509T111701 01-start.log
    |-- 20220509T111704 01-start.log
    |-- 20220509T111722 01-restart.log
    |-- 20220509T111732 01-restart.log
    `-- log -> 20220509T111732 01-restart.log

Internally we keep a record of the "log number" which gets incremented with each "rollover", you can find this near the top of each file e.g:

 Run: (re)start=1 log=1

We might be able to template this into the filename e.g:

`-- scheduler
    |-- 01-20220509T111659 01-start.log
    |-- 02-20220509T111701 01-start.log
    |-- 03-20220509T111704 01-start.log
    |-- 04-20220509T111722 01-restart.log
    |-- 05-20220509T111732 01-restart.log
    `-- log -> 05-20220509T111732 01-restart.log

Or:

`-- scheduler
    |-- 20220509T111659 01-01-start.log
    |-- 20220509T111701 01-02-start.log
    |-- 20220509T111704 01-03-start.log
    |-- 20220509T111722 01-04-restart.log
    |-- 20220509T111732 01-05-restart.log
    `-- log -> 20220509T111732 01-05-restart.log

Or possibly even (after all there are timestamps in the log [by default]):

`-- scheduler
    |-- 01-start-01.log
    |-- 02-start-01.log
    |-- 03-start-01.log
    |-- 04-restart-02.log
    |-- 05-restart-02.log
    `-- log -> 05-restart-02.log

Ideas?

Would be good if we could differentiate a rollover from a restart from the filename as workflows are often "restarted in anger" when problems occur.

2) Write the rsync command to the file install log

Might be nice to log the rsync command at the top of the fileinstall log. Makes it clearer what fileinstall means and how to interpret the stdout e.g:

diff --git a/cylc/flow/task_remote_mgr.py b/cylc/flow/task_remote_mgr.py
index 0c3a78768..be72ceb2e 100644
--- a/cylc/flow/task_remote_mgr.py
    b/cylc/flow/task_remote_mgr.py
@@ -56,6  56,7 @@ from cylc.flow.platforms import (
 )
 from cylc.flow.remote import construct_rsync_over_ssh_cmd, construct_ssh_cmd
 from cylc.flow.subprocctx import SubProcContext
 from cylc.flow.util import format_cmd
 from cylc.flow.wallclock import get_current_time_string
 from cylc.flow.workflow_files import (
     KeyInfo,
@@ -532,13  533,14 @@ class TaskRemoteMgr:
             Path(install_log_path).parent.mkdir(parents=True, exist_ok=True)
             with open(install_log_path, 'a') as install_log:
                 install_log.write(
-                    'File installation information for '
-                    f'{install_target}:\n{ctx.out}'
                     f'$ {format_cmd(ctx.cmd, maxlen=80)}'
                     '\n\n### STDOUT:'
                     f'\n{ctx.out}'
                 )
                 if ctx.err:
                     install_log.write(
-                        '\n File installation error for '
-                        f'{install_target}:\n{ctx.err}'
                         '\n\n### STDERR:'
                         f'\n{ctx.err}'
                     )
         if ctx.ret_code == 0:
             # Both file installation and remote init success

That produces files like this:

$ rsync --delete \ 
    --rsh=ssh -oBatchMode=yes -oConnectTimeout=10 \ 
    --include=/.service/ --include=/.service/server.key -a \ 
    --checksum --out-format=%o %n%L --no-t --exclude=log \ 
    --exclude=share --exclude=work --include=/app/*** \ 
    --include=/bin/*** --include=/etc/*** --include=/lib/*** \ 
    --exclude=* ~/cylc-run/remotes/run4/ \ 
    myhost:$HOME/cylc-run/remotes/run4/

### STDOUT:
send .service/server.key


### STDERR:
some stderr...

@datamel
Copy link
Contributor Author

datamel commented May 9, 2022

Thanks for the review.

I think I found one issue in testing, cylc cat-log has a feature for setting the log rotation number which I think needs to be updated to the new filename pattern?:

I'll have a look into this now.

1) Use log number in log filenames

...
We might be able to template this into the filename

Yes, good idea, I'll give this a whirl.
My preference would be for

-- scheduler
    |-- 01-start.log
    |-- 02-start.log
    |-- 03-start.log
    |-- 04-restart.log
    |-- 05-restart.log
    `-- log -> 05-restart.log

As it seems tidier to me, like you said we have timestamp in the files.

2) Write the rsync command to the file install log

Yes, love this idea. That would be super helpful for debugging.

@datamel datamel marked this pull request as draft May 12, 2022 12:52
@hjoliver
Copy link
Member

Yes, good idea, I'll give this a whirl.
My preference would be for ...

👍 mine too!

@datamel datamel marked this pull request as ready for review May 26, 2022 14:09
@hjoliver
Copy link
Member

hjoliver commented Jun 6, 2022

One typing issue causing trouble:

 cylc/flow/loggingutil.py:148: error: Need type annotation for "header_records" (hint: "header_records: List[<type>] = ...")  [var-annotated]

@MetRonnie MetRonnie self-requested a review June 8, 2022 11:12
@oliver-sanders
Copy link
Member

Have done a quick functional review, looks good. The log numbers increment correctly, the housekeeping is working well.

One small thing, the start number in the filename is one higher than the start number in the log itself, easy fix:

$ head scheduler/32-start-01.log 
2022-06-08T12:47:49Z INFO - Run: (re)start=0 log=32
$ head scheduler/37-restart-02.log 
2022-06-08T12:48:28Z INFO - Run: (re)start=1 log=5

@oliver-sanders
Copy link
Member

The config logs look ok, however, I think the log number here is not needed? Just need the start number?

|-- config
|   |-- 01-start-01.cylc
|   |-- 02-restart-02.cylc
|   `-- flow-processed.cylc

Copy link
Member

@MetRonnie MetRonnie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review.

I am a little confused by the numbers e.g. 03-restart-04.log. What do the 2 numbers mean? It doesn't match the PR description

Filename format change for scheduler log:
<timestamp>-<start/restart>.log

cylc/flow/loggingutil.py Show resolved Hide resolved
cylc/flow/loggingutil.py Outdated Show resolved Hide resolved
cylc/flow/loggingutil.py Outdated Show resolved Hide resolved
tests/unit/test_loggingutil.py Outdated Show resolved Hide resolved
tests/unit/test_loggingutil.py Outdated Show resolved Hide resolved
cylc/flow/pathutil.py Outdated Show resolved Hide resolved
cylc/flow/pathutil.py Outdated Show resolved Hide resolved
cylc/flow/scheduler.py Outdated Show resolved Hide resolved
@oliver-sanders oliver-sanders merged commit a24bdcd into cylc:master Jul 1, 2022
@datamel datamel deleted the log-dir-tidy branch July 1, 2022 14:46
wxtim added a commit to wxtim/cylc that referenced this pull request Jul 4, 2022
* master: (30 commits)
  Log dir tidy (cylc#4836)
  Cut down on back-compat warnings, plus other tidying (cylc#4943)
  Validate platform settings (background job runner) (cylc#4938)
  clean: push error message to stderr
  Update cylc/flow/id_match.py
  Fix unintended directory stripping during cylc install (cylc#4931)
  stop cylc validate swallowing useful errors (cylc#4936)
  Improve config reference docs (cylc#4916)
  glblcfg: increase default rolling archive length
  Fix job state with pre-submitted failure
  Update tests/functional/cylc-reinstall/02-failures.t
  reinstall: require workflow ID argument
  play: upgrade --start-task's specified in legacy format (cylc#4925)
  expand schema docstring internally (cylc#4926)
  Added a new task filtering test.
  Add comment [skip ci]
  Fix optparse Options class for std options (cylc#4919)
  uid: warn if the run number is provided as a cycle
  Tweak prev.
  Fix task filtering.
  ...
@hjoliver hjoliver modified the milestones: cylc-8.0rc4, cylc-8.0.0 Jul 5, 2022
MetRonnie added a commit to MetRonnie/cylc-flow that referenced this pull request Jul 12, 2022
MetRonnie added a commit to MetRonnie/cylc-flow that referenced this pull request Jul 13, 2022
MetRonnie added a commit to MetRonnie/cylc-flow that referenced this pull request Jul 15, 2022
oliver-sanders pushed a commit that referenced this pull request Jul 20, 2022
Fix failure to clean on remote host with leftover contact file
* Fix scan for recent log dir changes
* Follow-up to #4836
* GH Actions: pass if codecov upload fails (codecov upload is flaky)
* Update changelog
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tidy of log directory
5 participants