You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We want to convert SOME of the nested keys inside JSON field into normal fields in a STRUCT to take advantage of columnar, while keeping the remaining keys (because these keys are very flexible/sparse) inside the original JSON. The final result is a STRUCT with the selected keys, and the original JSON is trimmed/reduced to a slimmer variant with the common keys removed (so that the remaining JSON is much smaller).
It is a common ELT feature to schematize the JSON/Variant into a more structural schema, but we can only do so much because the input JSON can be messy. So the reality is to take good control of the commonly-used and frequently-appeared keys, but leave the other keys in the JSON.
The ultimate solution contains 2 more innovations:
support JSON Path in the structure parameter for json_transform(), so that we can extract the 2nd and 3rd level of nested keys - there is a performance penalty for the JSON Path, so we can consider a different function json_transformation_ext() to support JSON Path and keep json_transform() to handle only top-level keys only
support the 3rd parameter - remove_selected_keys=true to also remove those transformed keys from the original JSON. So the return contains a new STRUCT and a new JSON
This feature will be quite useful for the data preparation and schematization.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Problem
We want to convert SOME of the nested keys inside JSON field into normal fields in a STRUCT to take advantage of columnar, while keeping the remaining keys (because these keys are very flexible/sparse) inside the original JSON. The final result is a
STRUCT
with the selected keys, and the original JSON is trimmed/reduced to a slimmer variant with the common keys removed (so that the remaining JSON is much smaller).Currently, DuckDB does not support JSONPath in
json_transform()
:It is a common ELT feature to schematize the JSON/Variant into a more structural schema, but we can only do so much because the input JSON can be messy. So the reality is to take good control of the commonly-used and frequently-appeared keys, but leave the other keys in the JSON.
Proposal
json_transform()
is similar to https://docs.snowflake.com/en/sql-reference/functions/object_pickBut https://docs.snowflake.com/en/sql-reference/functions/object_delete can effectively remove all the top-level keys that have been flattened/shredded. DuckDB probably needs a similar function as well.
The ultimate solution contains 2 more innovations:
structure
parameter forjson_transform()
, so that we can extract the 2nd and 3rd level of nested keys - there is a performance penalty for the JSON Path, so we can consider a different functionjson_transformation_ext()
to support JSON Path and keepjson_transform()
to handle only top-level keys onlyremove_selected_keys=true
to also remove those transformed keys from the original JSON. So the return contains a new STRUCT and a new JSONThis feature will be quite useful for the data preparation and schematization.
Thank you!
Beta Was this translation helpful? Give feedback.
All reactions