Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
Description
Hudi chooses consistent hashing committed bucket metadata file on the basis of replace commit logged on hudi active timeline. but once hudi archives timeline, it falls back to default consistent hashing bucket metadata that is 00000000000000.hashing_meta , which result in writing duplicate records in the table .
above behaviour results in duplicate data in the hudi table and failing in subsequent clustering operation as there is inconsistency between file groups on storage vs file groups in metadata files
Check the loadMetadata function of consistent hashing index implementation.
let me know if anything else is needed.