Skip to content

Migrating tables to generic partitioning support

vinoth chandar edited this page Mar 27, 2017 · 1 revision

Background

Till 0.3.1, we have assumed implicitly that the data is partitioned by dates (which was a very very popular observation), i.e all partitions can be found 3 levels down from basepath via basePath/year/month/day. With PR121, we plan to generalize this, by maintaining a .hoodie_partition_metadata file under each partition.

Avoiding side effects for current deployments

  • Before rolling out new hoodie-client jar, with these changes, please set withAssumeDatePartitioning(true) in your HoodieWriteConfig. Without this, hoodie-client will look for partitions based on the metadata and if cannot find anything, it will not be able to write data.

Steps to clean migration

  • Use the cli tool with repair addpartitionmeta to add this metadata to existing partitions/tables
  • Rollout new hoodie-client, with withAssumeDatePartitioning(false) [default], all new partitions will have the metadata going forward
  • You can upgrade query engines with new hoodie-hadoop-mr jar, if you plan to have non date partitioned tables. Old input format with continue to work on date partitioned tables.