Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-6798

Implement event-time-based merging mode in FileGroupReader

    XMLWordPrintableJSON

Details

    Description

      To achieve this, we should add a new table config hoodie.record.merge.mode to control the record merging mode and behavior in the new file group reader (HoodieFileGroupReader) and implements event-time ordering in it. The table config hoodie.record.merge.mode is going to be the single config that determines how the record merging happens in release 1.0 and beyond.

       

      Three merging modes to define:

      • OVERWRITE_WITH_LATEST: using transaction time to merge records, i.e., the record from later transaction overwrites the earlier record with the same key. This corresponds to the behavior of existing payload class OverwriteWithLatestAvroPayload.
      • EVENT_TIME_ORDERING: using event time as the ordering to merge records, i.e., the record with the larger event time overwrites the record with the smaller event time on the same key, regardless of transaction time. The event time or preCombine field needs to be specified by the user. This corresponds to the behavior of existing payload class DefaultHoodieRecordPayload.
      • CUSTOM: using custom merging logic specified by the user. When a user specifies a custom record merger strategy or payload class with Avro record merger, this is going to be specified so the record merging follows user-defined logic as before.

      Attachments

        Issue Links

          Activity

            People

              guoyihua Ethan Guo
              guoyihua Ethan Guo
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: