Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-499

Allow partition path to be updated with GLOBAL_BLOOM index

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Implemented
    • None
    • 0.5.2
    • index

    Description

      Context

      When a record is to be updated with a new partition path, and when set to GLOBAL_BLOOM as index, the current logic implemented in https://github.com/apache/incubator-hudi/pull/1091/ ignores the new partition path and update the record in the original partition path.

      Proposed change

      Allow records to be inserted into their new partition paths and delete the records in the old partition paths. A configuration (e.g. hoodie.index.bloom.update.partition.path=true) can be added to enable this feature.

      An example use case

      A Hudi dataset manages people info and partitioned by birthday. In most cases, where people info are updated, birthdays are not to be changed (that's why we choose it as partition field). But in some edge cases where birthday info are input wrongly and we want to manually fix it or allow user to updated it occasionally. In this case, option 2 would be helpful in keeping records in the expected partition, so that a query like "show me people who were born after 2000" would work.

       

      Attachments

        Issue Links

          Activity

            People

              rxu Raymond Xu
              rxu Raymond Xu
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 40m
                  40m