Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-7447

Fix not bootstrap when subTask restart when OPCoordinator handle CheckPointComplete not finished

    XMLWordPrintableJSON

Details

    Description

      1. In Insert mode, when the SubTask is restarted, the OperatorCoordinator is in the notifyCheckpointComplete of CheckpointId-100 for a long time. This may be due to the time-consuming processing of some tableService scanning hdfs, or the time-consuming hdfs execution encountered during Rollback and initInstant.
      2. At this time, ckp-meta/instantId.INFLIGHT is not completed, but the corresponding commit file has been submitted. At this time, the bootstrap event will be sent when the subTask restarts.
      3. After the OperatorCoordinator completes processing the notifyCheckpointComplete, it will create a new Instant, and the subTask will create the corresponding parquet file, etc. based on the Instant.
      4. OperatorCoordinator then processes the bootstrap event, creates another new Instant, and rolls back the Instant created in the third step. This causes OperatorCoordinator and Operator to begin to be inconsistent.

      This is related to Hudi's three-stage submission, including data snapshot, submit commit file, and submit ckp_meta file

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              shenwenbing Wenbing Shen
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: