Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-4912

Make write status idempotent

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • index
    • None

    Description

      HBase Index update some times not inconsistent with data. The main reason is that the result of task is not idempotent. A task run two times may get different bucket assign result. 

      • Hudi on spark cache write status on executor. Once executor exits before commit, wrtie status will be regenerated. However, hbase index is updated by previous write status and will not be updated by new write status.
      • When we use speculation in bulkinsert, hbase index is updated concurrently. Though only one task can succeed, it doesn't mean that all content in index is updated by this task. Those content updated by other failed task may be inconsistent with data.

      Attachments

        Activity

          People

            Unassigned Unassigned
            guanziyue ZiyueGuan
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: