Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-2751

To avoid the duplicates for streaming read MOR table

    XMLWordPrintableJSON

Details

    Description

      Imagine there are commits on the timeline:

                               -----delta-99 ----- commit 100(include 99 delta data set) ----- delta-101 ----- delta-102 -----
                                first read ->| second read ->
                               – range 1 ---| ----------------------range 2 -------------------|
      
      

      instant 99, 101, 102 are successful non-compaction delta commits;
      instant 100 is successful compaction instant.

      The first inc read consumes to instant 99 and the second read consumes from instant 100 to instant 102, the second read would consumes the commit files of instant 100 which has already been consumed before.

      The duplicate reading happens when this condition triggers: a compaction instant schedules then completes in one consume range.

      Attachments

        Activity

          People

            shivnarayan sivabalan narayanan
            danny0405 Danny Chen
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: