-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add basic ItemCollection implementation #430
Add basic ItemCollection implementation #430
Conversation
I think I lean toward I think it's ok to require users having to actively open an ItemCollection with ItemCollection.from_file. As long as they know that's how you open an ItemCollection. I know it doesn't meet the request from @scottyhq , but I think that's just because before it was ambiguous on if an ItemCollection was a STACObject, because there was the ItemCollection class in stac-api-spec, and it had Now that it's clear an ItemCollection is just a GeoJSON FeatureCollection with features that are STAC Items, it's reasonable that it you would open it differently. |
Wait, I think I'm wrong here....I don't think a STACObject function should return a non STACObject. But the pystac.read_file is just a convenience function, and now that I look at the code I think this is exactly the right way to do it. STACObject still only returns a STACObject, but the |
Finished going through this, looks really good overall, with some comments above. I can see eventually having some high level convenience functions in ItemCollection (such as calculating summaries over an ItemCollection), but I think this great for an initial implementation. |
Codecov Report
@@ Coverage Diff @@
## main #430 /- ##
==========================================
Coverage 89.56% 89.67% 0.11%
==========================================
Files 39 40 1
Lines 5115 5172 57
==========================================
Hits 4581 4638 57
Misses 534 534
Continue to review full report at Codecov.
|
@lossyrob This should be ready for your review, too, if you want to take a look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm besides the top level read and write methods, which I commented on. I'm -0.5 for the change to those methods, so if there's someone feels strongly it should be that way I'm fine with things getting merged in as is.
pystac/__init__.py
Outdated
@@ -71,7 72,7 @@ | |||
) | |||
|
|||
|
|||
def read_file(href: str) -> STACObject: | |||
def read_file(href: str) -> Union[STACObject, ItemCollection]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure it makes sense to change these top level package read/write methods to include ItemCollection. It breaks user code that may rely on the STACObject type, forcing the type differentiation to happen by the caller. Also the last two parameters of write_file
make it feel a bit shoe-horned. On the other hand, I can see people getting confused about why read_file
wouldn't work on an ItemCollection if they didn't know it wasn't a core stac object.
I think my preference would be to keep these methods working with core STAC types, and force users to treat ItemCollections as a separate concept, which makes more sense with the spec as it currently stands.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I agree with all of the points you make here, and I think my preference would be to not change the top-level functions as well. If that ends up being the decision, I will add more clear documentation to the ItemCollection
class indicating that it is not a STACObject
and cannot be read using those top-level methods.
@scottyhq I'm curious how much of a priority it is for you to be able to read Item Collections using pystac.read_file
vs. ItemCollection.from_file
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the work on this! Yes, I'm coming at this primarily from someone new to STAC, who is unfamiliar with spec details. The key expectation I have is that after searching a STAC API for data, and saving 'results.json' to work with later, there is an straightforward way to open that file and navigate it. For Python it seems like pystac_client
takes care of the searching and pystac
should care of the I/O and navigation of the results. I don't think people should have to understand the concepts of ItemCollections versus Collections, Core vs Not, for this fundamental workflow.
So if pystac.read_file
can't handle ItemCollections, and a separate ItemCollection.from_file()
is the way forward (or pystac_client.read_file()`?), I think that just needs to be clearly documented.
Also useful (and I think a non-breaking change) would then be for pystac.read_file
to have error handing that can recognize the json is an ItemCollection and suggest the correct method to open it, rather than the current KeyError: 'id'
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @scottyhq, that all makes sense to me. I'll change those top-level functions back to their original signatures and make sure we have clear docs on how to work with ItemCollections. I'm pretty sure the issue with a KeyError
being raised in pystac.read_file
is fixed by #402, but I'll add a test to be sure.
@matthewhanson Looking back at the code example in the original issue it seems like the name of the ItemSearch.items_as_collection
method might also be misleading. Maybe we should rename that to items_as_item_collection
so that users don't think they are saving a STAC Collection?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed ItemCollection
handling from top-level functions and added test that pystac.read_file
raises a STACTypeError
instead of the KeyError
in 404bb99
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@duckontheweb ItemSearch.items_as_collection
was misleading. I've renamed it already, there's now get_pages
(get the raw JSON of the pages), get_item_collection
(gets pages as item collections), get_items
(iterator through all pages and items), and get_all_items
(gets all items from all pages and returns as single ItemCollection). This matches the get_
syntax used in PySTAC
404bb99
to
2bb16a1
Compare
@lossyrob It also just occurred to me that including |
@duckontheweb I agree that's not great, and should have been removed when ItemCollection was dropped. I'm 1 for dropping it from that Enum and using another separate constant in the migration/validation places where needed. |
I removed this in 47017df. I ended up just removing any ItemCollection references in the |
@lossyrob @matthewhanson Ready for another review whenever you have time. Thanks! |
@duckontheweb Looks good to me to merge! |
stac-utils/pystac#430 implemented `pystac.ItemCollection`, which is being removed from `pystac-client` in the next release. This PR adds similar handling for `pystac.ItemCollection`.
Related Issue(s): #
Description:
Adds
ItemCollection
class to work with GeoJSON Feature Collections containing only STAC Items. This class can be serialized/deserialized from JSON, and also implementsCollection
to make it more convenient to loop over items, get an item by index, and check if an Item is included. Note that until we implement more specific STAC object equality (as per #380),assert item in item_collection
will always fail unlessclone_items=False
is used when instantiating theItemCollection
.This PR also updates the I/O functions in
pystac.__init__
to work with the newItemCollection
class in response to #371. SinceItemCollection
s are notSTACObject
s, this means that the read functions in that file (read_file
,read_dict
) are now not guaranteed to return aSTACObject
. This means users will need to either handle both cases in their code, orcast
the result of those functions to the appropriate type. I'm interested in feedback on whether this is worth the change, or if we should keep these functions unchanged and make people useItemCollection.from_file
in order to read ItemCollections.PR Checklist:
scripts/format
)scripts/test
)cc: @scottyhq
UPDATE 2021-06-10:
Implemented
__contains__
to makeItemCollection
atyped.Collection
since we now have the option not to clone Items on instantiation and we are no longer manipulating the Item links.