Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-32754

Using SplitEnumeratorContext.metricGroup() in restoreEnumerator causes NPE

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Information Provided
    • 1.17.0, 1.17.1
    • None
    • None

    Description

      We registered some metrics in the `enumerator` of the flip-27 source via `SplitEnumerator.metricGroup()`, but found that the task prints NPE logs in JM when restoring, suggesting that `SplitEnumerator. metricGroup()` is null.
      Meanwhile, the task does not experience failover, and the Checkpoints cannot be successfully created even after the task is in running state.

      We found that the implementation class of `SplitEnumerator` is `LazyInitializedCoordinatorContext`, however, the metricGroup() is initialized after calling lazyInitialize(). By reviewing the code, we found that at the time of SourceCoordinator.resetToCheckpoint(), lazyInitialize() has not been called yet, so NPE is thrown.

      Q: Why does this bug prevent the task from creating the Checkpoint?
      `SourceCoordinator.resetToCheckpoint()` throws an NPE which results in the member variable `enumerator` in `SourceCoordinator` being null. Unfortunately, all Checkpoint-related calls in `SourceCoordinator` are called via `runInEventLoop()`.
      In `runInEventLoop()`, if the enumerator is null, it will return directly.

      Q: Why this bug doesn't trigger a task failover?
      In `RecreateOnResetOperatorCoordinator.resetAndStart()`, if `internalCoordinator.resetToCheckpoint` throws an exception, then it will catch the exception and call `cleanAndFailJob ` to try to fail the job.
      However, `globalFailureHandler` is also initialized in `lazyInitialize()`, while `schedulerExecutor.execute` will ignore the NPE triggered by `globalFailureHandler.handleGlobalFailure(e)`.
      Thus it appears that the task did not failover.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              Yu Chen Yu Chen
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: