Enable factory references to create new dimensions on load. #6168

pp-mo · 2024-10-09T07:51:36Z

Relates to #5369 , #6165

pp-mo · 2024-10-11T14:27:22Z

Now updated with prototype loading-control object
NOTE To-dos at least :

no proper tests yet
work needed to implement the loading policy in load_cubes/load_cube
fix crashing plot tests (now skipped)

stephenworsley

Looking good, just a couple comments.

stephenworsley · 2024-10-11T16:24:07Z

lib/iris/__init__.py

+ return policy
+
+
+def _apply_loading_policy(cubes, policy=None):


There could be a case for making something like this public, perhaps as a method of LoadPolicy. There could concievably be a situation where a user would want an equivalent combination, but it's not appropriate to do on load. Perhaps in cases where there needs to be some processing of cubes in order to make them mergeable which is not possible to do as a callback.

Yes, I agree this could be a useful general tool in its own right.
In that case it would be useful to expose the "POLICY" settings as simple arguments here
(especially if they get simpler, as described in the "new settings strategy" below).
Going forward, I'll try make that workable.

stephenworsley · 2024-10-14T10:33:24Z

lib/iris/fileformats/rules.py

+ self.found_multiple_refs = False
+
+
+# A single global object (per thread) to record whether multiple reference fields


If there is an instance per thread, is there any risk that behaviour is dependant on how multiprocessing is done? Or is the relevant loading done entirely within a single thread?

I believe the loading code is all single-threaded in operation. It has to be since, while it has the input file open, it might not be safe to run parallel loading tasks since the file system interface may not be thread-safe (e.g. as for netcdf4-python) .

stephenworsley · 2024-10-14T10:38:20Z

lib/iris/fileformats/rules.py

+# the latest load operation.
+# This is used purely to implement the iris.LOAD_POLICY.multiref_triggers_concatenate
+# functionality.
+_MULTIREF_DETECTION = MultipleReferenceFieldDetector()


I don't understand why this would be instantiated at the module level rather than during a load call. Would this not change your loading behaviour depending on what files you have already loaded during your session?

It's explained here, that all the iris.load_xxx load functions call via _load_collection, which resets this. It is admittedly a weak contract (!).
We need it to behave like the settings controls. i.e. to exist as a common (but per-thread) global reference, so it can affect low-level behaviour without the need to pass the control down through all functions in the call chain.

trexfeathers · 2024-10-14T10:40:25Z

lib/iris/__init__.py

+ def __init__(
+ self,
+ support_multiple_references: bool = False,
+ multiref_triggers_concatenate: bool = False,
+ use_concatenate: bool = False,
+ use_merge: bool = True,
+ cat_before_merge: bool = False,
+ repeat_until_done: bool = False,
+ ):
+ """Container for loading controls."""
+ self.support_multiple_references = support_multiple_references
+ self.multiref_triggers_concatenate = multiref_triggers_concatenate
+ self.use_concatenate = use_concatenate
+ self.use_merge = use_merge
+ self.cat_before_merge = cat_before_merge
+ self.repeat_until_done = repeat_until_done
+
+ def __repr__(self):
+ msg = (
+ "LoadPolicy("
+ f"support_multiple_references={self.support_multiple_references}, "
+ f"multiref_triggers_concatenate={self.multiref_triggers_concatenate}, "
+ f"use_concatenate={self.use_concatenate}, "
+ f"use_merge={self.use_merge}, "
+ f"cat_before_merge={self.cat_before_merge}, "
+ f"repeat_until_done={self.repeat_until_done}"
+ ")"
+ )
+ return msg
+
+ def copy(self):
+ return LoadPolicy(**{key: getattr(self, key) for key in self._allkeys})


I think data classes give you some/all of this, and you can still add your own methods in addition.

trexfeathers · 2024-10-14T11:23:15Z

lib/iris/__init__.py

+ "support_multiple_references",
+ "multiref_triggers_concatenate",
+ "use_concatenate",
+ "use_merge",
+ "cat_before_merge",
+ "repeat_until_done",


I like the idea of giving users more controls to play with, but having 6 booleans is very confusing, so I'm keen to look for ways it could be simpler to understand.

It doesn't look like support_multiple_references is necessary

use_concatenate, use_merge and cat_before_merge could become some sort of controllable sequence e.g. {"c", "m"} / {"m"} / {"m", "c"}.

OK, as discussed offline, I think we can simplify this with benefits.

See new strategy proposals, below.

trexfeathers · 2024-10-14T11:26:27Z

lib/iris/__init__.py

+LOAD_POLICY_LEGACY = LoadPolicy()
+LOAD_POLICY_RECOMMENDED = LoadPolicy(
+ support_multiple_references=True, multiref_triggers_concatenate=True
+)
+LOAD_POLICY_COMPREHENSIVE = LoadPolicy(
+ support_multiple_references=True, use_concatenate=True, repeat_until_done=True
+)


Might be simpler to follow if these 'presets' were all part of another object. Could they be class attributes e.g. LoadPolicy.recommended? It would be good if the only global constant were LOAD_POLICY.

Agreed, I don't like this much.
Under review, watch this space!

trexfeathers · 2024-10-14T11:28:39Z

lib/iris/__init__.py

+ """Return context manager for temporary options.
+
+ Modifies the given parameters within a context, for the active thread.
+ """


Could do with more detail and examples - I'm struggling to picture use cases.

Agreed. This is not intended to be complete yet -- still just draft ideas

trexfeathers

I've had my rough look around.

I think the user experience is almost there. Seems like a good compromise that still holds the user's hand as much as possible (they don't write the code themselves), while still offering more customisation than currently.

The developer experience feels very complicated. I think this is worth some attention, since the developer experience around loading is already too complicated and I have noticed can be a big barrier for less experienced people. Afraid I don't have any immediate suggestions, only that it's worth some time.

Thanks!

pp-mo · 2024-10-14T11:50:34Z

@trexfeathers I think the user experience is almost there. Seems like a good compromise that still holds the user's hand as much as possible (they don't write the code themselves), while still offering more customisation than currently.

I'm not providing additional docstrings yet, but it's clearly needed (if not a userguide section?)
Likewise what you already said : the new functionality badly needs usage examples
-- on the TODO list

pp-mo · 2024-10-14T11:53:51Z

@trexfeathers The developer experience feels very complicated. I think this is worth some attention, since the developer experience around loading is already too complicated and I have noticed can be a big barrier for less experienced people. Afraid I don't have any immediate suggestions, only that it's worth some time.

You're totally right, and I'm currently planning to remove all the pieces here to a single sub-module "iris.io.loading", which I think will help a bit.
I have a draft of that, and it gathers the _CubeFilterCollection and _CubeFilter classes, supporting routines "_generate_cubes" and "_load_collection", all the new "policy" code and all the load_xxx functions.
Along with that, I am considering removing the _CubeFilterCollection and _CubeFilter classes altogether as I think they may obscure rather than help.

However, I want to add some proper testcases before I attempt a lift-and-shift, as otherwise some breakages might go unnoticed.

pp-mo · 2024-10-14T14:17:25Z

Current best ideas for a "new control strategy"

Following offline discussion wuith @trexfeathers

Just 3 control keywords:

support_multiple_references = True/False
- (as before, but now implies the old multiref_triggers_concatenate=True behaviour)
merge_concatenate_sequence = "m" / "c" / "mc" / "cm"
repeat_until_no_changes = True/False

Plus...

combine settings in a simple dictionary or namedtuple
use simple class constants for common choices
make support_multiple_references True by default, so it ..
- includes the triggering of a final concatenate
- delivers required solution for the key new cases (i.e. time-dependent hybrid height)
- delivers existing result for 99% of existing cases
- for 1% of broken legacy loads, can be easily forced to "old behaviour"

pp-mo · 2024-10-14T16:46:58Z

Progress Update :

For now I'm deferring the user-API improvements detailed above.
The plan now is roughly :

fix the existing functionality
- note : latest commit includes changes to the 'unique' keyword in merges, which ought to fix some of the outstanding test failures
add explicit tests for various loading testcases, including multiple-references ones
fix the outstanding test failures resolve/reinstate skipped tests (i.e. the plotting crashes)
move all the disparate moving parts into a "iris.io.loading" submodule
simplify the user API
? fix default loading policy to new-style ? (and fix any tests it breaks)
add new user docstrings whatsnew describing changes

codecov · 2024-10-14T17:35:25Z

Codecov Report

Attention: Patch coverage is 61.94690% with 43 lines in your changes missing coverage. Please review.

Project coverage is 89.64%. Comparing base (54f430b) to head (027b7a0).
Report is 4 commits behind head on main.

Files with missing lines	Patch %	Lines
lib/iris/__init__.py	53.33%	24 Missing and 11 partials ⚠️
lib/iris/fileformats/rules.py	74.19%	6 Missing and 2 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #6168       /-   ##
==========================================
- Coverage   89.81%   89.64%   -0.18%     
==========================================
  Files          88       88              
  Lines       23178    23277       99     
  Branches     4313     4333       20     
==========================================
  Hits        20818    20867       49     
- Misses       1628     1663       35     
- Partials      732      747       15

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…_cubes/load_cube

… in load;load_cube/load_cubes.

pp-mo mentioned this pull request Oct 9, 2024

Code solutions for time-dependent hybrid height #6165

Open

stephenworsley reviewed Oct 14, 2024

View reviewed changes

trexfeathers reviewed Oct 14, 2024

View reviewed changes

pp-mo force-pushed the load_factory_dims branch from 3942968 to e135573 Compare October 14, 2024 16:48

pp-mo added 10 commits October 15, 2024 11:06

Enable factory references to create new dimensions on load.

dc39651

Skip hanging tests.

d712682

Skip more hanging tests.

5e53a08

Adjust misleading comment.

92369ec

Add policy control and auto-detect. NOTE: for now only load, not load…

8563455

…_cubes/load_cube

Add temporary testcode. NB no actual test, just printout.

a692862

Replaced _CubeFilterCollection.merged() with combined(); replace uses…

653561f

… in load;load_cube/load_cubes.

Fix licence header

393be73

Fix to handle empty reference correctly.

ca3955f

Fix tests.

5fa2b46

pp-mo force-pushed the load_factory_dims branch 2 times, most recently from f7ddb65 to 0f1bae3 Compare October 15, 2024 13:41

Simplify policy options and tidy api.

801f9e2

pp-mo force-pushed the load_factory_dims branch from 0f1bae3 to 801f9e2 Compare October 15, 2024 13:55

More documentation of loading options.

edfea05

stephenworsley mentioned this pull request Oct 17, 2024

Document use of new_axis to control merge #6180

Draft

pp-mo added 2 commits October 18, 2024 20:46

Fix doctest.

984a59c

Fix repeated combination.

027b7a0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable factory references to create new dimensions on load. #6168

Enable factory references to create new dimensions on load. #6168

pp-mo commented Oct 9, 2024 •

edited

Loading

pp-mo commented Oct 11, 2024 •

edited

Loading

stephenworsley left a comment

stephenworsley Oct 11, 2024

pp-mo Oct 14, 2024 •

edited

Loading

stephenworsley Oct 14, 2024

pp-mo Oct 14, 2024

stephenworsley Oct 14, 2024

pp-mo Oct 14, 2024 •

edited

Loading

trexfeathers Oct 14, 2024 •

edited

Loading

trexfeathers Oct 14, 2024

pp-mo Oct 14, 2024 •

edited

Loading

trexfeathers Oct 14, 2024

pp-mo Oct 14, 2024

trexfeathers Oct 14, 2024

pp-mo Oct 14, 2024

trexfeathers left a comment

pp-mo commented Oct 14, 2024 •

edited

Loading

pp-mo commented Oct 14, 2024

pp-mo commented Oct 14, 2024 •

edited

Loading

pp-mo commented Oct 14, 2024 •

edited

Loading

codecov bot commented Oct 14, 2024 •

edited

Loading

		self.found_multiple_refs = False


		# A single global object (per thread) to record whether multiple reference fields

Enable factory references to create new dimensions on load. #6168

Are you sure you want to change the base?

Enable factory references to create new dimensions on load. #6168

Conversation

pp-mo commented Oct 9, 2024 • edited Loading

pp-mo commented Oct 11, 2024 • edited Loading

stephenworsley left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pp-mo Oct 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pp-mo Oct 14, 2024 • edited Loading

Choose a reason for hiding this comment

trexfeathers Oct 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pp-mo Oct 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

trexfeathers left a comment

Choose a reason for hiding this comment

pp-mo commented Oct 14, 2024 • edited Loading

pp-mo commented Oct 14, 2024

pp-mo commented Oct 14, 2024 • edited Loading

Current best ideas for a "new control strategy"

pp-mo commented Oct 14, 2024 • edited Loading

Progress Update :

codecov bot commented Oct 14, 2024 • edited Loading

Codecov Report

pp-mo commented Oct 9, 2024 •

edited

Loading

pp-mo commented Oct 11, 2024 •

edited

Loading

pp-mo Oct 14, 2024 •

edited

Loading

pp-mo Oct 14, 2024 •

edited

Loading

trexfeathers Oct 14, 2024 •

edited

Loading

pp-mo Oct 14, 2024 •

edited

Loading

pp-mo commented Oct 14, 2024 •

edited

Loading

pp-mo commented Oct 14, 2024 •

edited

Loading

pp-mo commented Oct 14, 2024 •

edited

Loading

codecov bot commented Oct 14, 2024 •

edited

Loading