-
-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EPA CEMS Intake Catalog #1564
Comments
I'll do a review on the Intake catalog PR but this is just a few comments from a first pass at the notebook: Questions
Other nits
|
Whoops yes I forgot to add the intake requirements. I had them installed in my local environment. The dtypes you've got listed there seem to be the correct ones. But you have no I really don't understand how the source specific metadata works. My suspicion is that the allowable year/state values can be put in there, and the column/table descriptions, but I don't see any documentation on how to do it appropriately. |
Hey @martindurant thanks so much for your comment on #1496! I got Do you happen to have a list of publicly visible intake catalogs that use Parquet data sources? I've tried searching GitHub but haven't been very successful. The CarbonPlan Data repo is the best I've seen, but they have a very simple configuration. Once the EPA CEMS Hourly Emissions data source is finished, we also want to look at writing an intake-sqlite driver (#1156) to manage the distribution of versioned SQLite databases, which will download and cache the database file locally, and then use the |
Description
Create a full featured Intake Catalog for distributing the EPA CEMS hourly emissions data stored as Parquet files. This follows some exploration in #1155. See also notes in #1495 and PR #1563
Billing
This work should be billed under our Sloan Foundation "Data Distribution" sub-project.
Goals
Tasks / Issues tracked by this Epic
Phase 1:
Get a functional intake catalog deployed for demonstration & feedback.
read_parquet()
fsspec
/simplecache
(see comment on Scope improvements to intake-parquet #1496)pudl_catalog
for installation usingpip
.Phase 2:
Flesh out metadata and improve performance.
Out of Scope
The text was updated successfully, but these errors were encountered: