Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
HBase Index update some times not inconsistent with data. The main reason is that the result of task is not idempotent. A task run two times may get different bucket assign result.
- Hudi on spark cache write status on executor. Once executor exits before commit, wrtie status will be regenerated. However, hbase index is updated by previous write status and will not be updated by new write status.
- When we use speculation in bulkinsert, hbase index is updated concurrently. Though only one task can succeed, it doesn't mean that all content in index is updated by this task. Those content updated by other failed task may be inconsistent with data.