Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-2496

Inserts are precombined even with dedup disabled

    XMLWordPrintableJSON

Details

    Description

      Original GH issue https://github.com/apache/hudi/issues/3709

      Test case by xushiyan : https://github.com/apache/hudi/pull/3723/files

      RCA by shivnarayan :

      Within HoodieMergeHandle, we use a hashmap to store incoming records, where keys are record keys.
      and so, if you see 1st batch, duplicates would remain intact. but wrt 2nd batch, only unique records are considered and later concatenated w/ 1st batch.
      https://github.com/apache/hudi/blob/36be28712196ff4427c41b0aa885c7fcd7356d7f/hudi-[]…-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java

      Attachments

        Issue Links

          Activity

            People

              helias_an Helias Antoniou
              codope Sagar Sumit
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: