Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large suites: light weight task proxies. #1689

Open
hjoliver opened this issue Dec 6, 2015 · 10 comments
Open

Large suites: light weight task proxies. #1689

hjoliver opened this issue Dec 6, 2015 · 10 comments
Assignees
Labels
efficiency For notable efficiency improvements
Milestone

Comments

@hjoliver
Copy link
Member

hjoliver commented Dec 6, 2015

A major limiting factor for very large suites is memory use by task proxies. This issue is about minimizing the size of task proxies (the total number of them matters too, but less so if they become very small).

Task proxies have two distinct purposes:

  1. scheduling: they keep track of task prerequisites and outputs.
    • small, used throughout the task proxy life cycle (for dependency matching).
  2. hold task runtime settings - written to the job file submitted to run the task.
    • potentially very large,but only needed at job submit time.

Clearly we need to either eliminate all duplication of runtime settings that could be shared by task proxies or only load runtime settings when needed at job submit time, then immediately forget them again.

@hjoliver
Copy link
Member Author

hjoliver commented Dec 6, 2015

Eliminate duplication of runtime settings?

Default settings are now shared by task proxies, by means of intervening in the dict look-up mechanism: if a requested item doesn't exist, look it up in the shared defaults dict (#1500).

Inherited settings could also be shared, not duplicated across the inheriting tasks (I think we're still doing that...). Could this be done like the defaults, or is storing references to the same data enough?.

However, the minimum size with all possible sharing of data is still potentially very large: it is the amount of information under [runtime] after Jinja2 processing but before inheritance is worked through (plus defaults).

@hjoliver
Copy link
Member Author

hjoliver commented Dec 6, 2015

Load runtime settings only at job submit time?

This is clearly the right thing to do given that (a) the minimum size of all task runtimes is potentially very large; and (b) it is not needed by the suite daemon for any purpose other than job submission.

Choices:

  1. Write the runtime dict for each task proxy class to disk at start-up. At job submit time, load it, write the job file, forget it.
    • The extra disk I/O at job submission time will have no impact: it occurs at exactly the point that we already do a similar amount of I/O - to write the job script.
  2. Hold the minimal complete runtime (above) in memory, and work through inheritance again each time before job submission.
    • If inherited settings can all be shared by task proxies instead of duplicated (above) then there is no advantage in delaying inheritance processing until the data is needed. Even if there was an advantage, this would not help suites in which the bulk of runtime settings are not acquired through inheritance.
  3. Read and parse the whole suite definition off disk again prior to every job submission, to extract the runtime for the task.
    • this is clearly a bad idea - it involves a lot more I/O than 1. and a lot more processing - parsing a large suite can take a significant amount of time (and maybe just as much memory use as we're trying to avoid, although it could possibly be done by the job submission command).

Therefore, by a process of unassailable logic I propose that we implement 1. above. 😬 Possible caveat, next comment below.

(Note that #1428 which - in effect - dropped all task runtime info immediately after job submission demonstrated a factor of six reduction in memory use for a large suite with a lot of runahead; however, the implementation there had some negative consequences that this proposal does not have, e.g. on monitoring and the ability to re-trigger tasks that have already finished. It may be that optimal sharing of task runtime data could also have achieved a big reduction here.)

@hjoliver
Copy link
Member Author

hjoliver commented Dec 7, 2015

Possible caveat: can the task runtime conf files be generated incrementally at start-up without loading the entire runtime configuration - without defaults - into memory first, for inheritance processing? If not, is brief high memory use at start-up that much better than ongoing high memory use? The disk-based solution would still be much simpler (no need to bother with the extra complexity of ensuring optimal sharing of all settings).

@matthewrmshin matthewrmshin added this to the soon milestone Dec 7, 2015
@hjoliver
Copy link
Member Author

hjoliver commented Dec 7, 2015

(this is in fact rather an old idea: #108 (comment))

@matthewrmshin
Copy link
Contributor

With your proposed solution 1, are we going to end up with a new small file per task/job? The only concern is that it increases inode usage on the file system, (and some file systems are very unfriendly to lots of small files), but maybe it does not matter.

(On similar note, but unrelated to this issue, perhaps we should move the job-activity.log file one level up, as it is shared between jobs of the same task.)

@hjoliver
Copy link
Member Author

hjoliver commented Dec 7, 2015

No, it'll just be one new small file for each task class (parameterized type I guess; no longer a class), created at suite start-up. I.e. one for each task name. They'll get re-used for each task instance/job.

@matthewrmshin
Copy link
Contributor

Sounds like a good compromise.

@hjoliver hjoliver self-assigned this Dec 8, 2015
@hjoliver
Copy link
Member Author

hjoliver commented Dec 8, 2015

To get the full benefit of reduced task proxy size, we need to avoid the initial memory high water mark caused by parsing the suite in its entirety (which includes all the task proxy runtime info). Even when this data is garbage collected after writing the new task proxy runtime config files, the memory may not be returned to the OS by the Python interpreter (although it will be re-used internally) - i.e. the external "resident memory size" of the suite daemon may not go down after suite parsing.

So, we have agreed on the following (via email):

low-memory suite parsing design

(Noting that when a process finishes, all of its memory is released to the OS).

  1. Generate Jinja2-processed file in a subprocess (this is necessarily monolithic, but the whole file is just treated as text by Jinja2).
  2. Read and parse the processed file, skipping over (ignoring) all lines in the [runtime] section.
  3. Read the [runtime] section line by line just to extract the "inherit" item (i.e. not full parsing) for each namespace, then compute the C3 linearization (inheritance order)
  4. Use a process pool to generate the new task config files concurrently, one process for each task: read and parse just the [runtime] namespaces in the inheritance list for the task (and maybe the inheritance can even be done in-place in a single data structure rather than one for each member of the parent list).
  5. At job submit time, have the job-submit command (which is in a sub-process) read the task config file, not the suite daemon.

@hjoliver
Copy link
Member Author

hjoliver commented Jun 22, 2016

[meeting]

  • worth trying and profiling
  • some concern about extra I/O, extra files on disk, however:
    • one file per task name, not per instance
    • at read time (just prior to job submit) we already do I/O (write the job script)
    • could use one db for all these files?

@hjoliver
Copy link
Member Author

#3515 (spawn on demand) does not address this issue but will reduce task pool size so much that it may not be relevant anymore. But before closing this we should consider the ideas above for reducing the memory footprint due to parsing the suite at start-up?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
efficiency For notable efficiency improvements
Projects
None yet
Development

No branches or pull requests

2 participants