Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
None
Description
Original GH issue https://github.com/apache/hudi/issues/3709
Test case by xushiyan : https://github.com/apache/hudi/pull/3723/files
RCA by shivnarayan :
Within HoodieMergeHandle, we use a hashmap to store incoming records, where keys are record keys.
and so, if you see 1st batch, duplicates would remain intact. but wrt 2nd batch, only unique records are considered and later concatenated w/ 1st batch.
https://github.com/apache/hudi/blob/36be28712196ff4427c41b0aa885c7fcd7356d7f/hudi-[]…-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java
Attachments
Issue Links
- links to