Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-5762

Protect initializeState() and open() by the same lock.

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.3.0
    • Fix Version/s: 1.3.0
    • Component/s: DataStream API
    • Labels:
      None

      Description

      Currently the initializeState() of all operators in a task is called without the checkpoint lock, and before the open(). This may lead to problematic situations as the following:

      In the case that we retrieve timers from a checkpoint, e.g. WindowOperator and (future) CEP, if we re-register them in the initializeState(), then if they fire before the open() of the downstream operators is called, we will have a task failure, as the downstream channels are not open.

      To avoid this, we can put the initializeState() in the same lock as the open(), and the two operations will happen while being protected by the same lock, which also keeps timers from firing.

        Issue Links

          Activity

          Hide
          aljoscha Aljoscha Krettek added a comment -

          +1

          Show
          aljoscha Aljoscha Krettek added a comment - +1
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user kl0u opened a pull request:

          https://github.com/apache/flink/pull/3291

          FLINK-5762 Protect initializeState() and open() by the same lock

          The description is in the related jira:

          https://issues.apache.org/jira/browse/FLINK-5762

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/kl0u/flink init-state-lock

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/flink/pull/3291.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #3291


          commit 3b2806895a433a1cf87bda43dd50fd102d2e9b6d
          Author: kl0u <kkloudas@gmail.com>
          Date: 2017-02-09T15:02:27Z

          FLINK-5762 Protect initializeState() and open() by the same lock


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user kl0u opened a pull request: https://github.com/apache/flink/pull/3291 FLINK-5762 Protect initializeState() and open() by the same lock The description is in the related jira: https://issues.apache.org/jira/browse/FLINK-5762 You can merge this pull request into a Git repository by running: $ git pull https://github.com/kl0u/flink init-state-lock Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3291.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3291 commit 3b2806895a433a1cf87bda43dd50fd102d2e9b6d Author: kl0u <kkloudas@gmail.com> Date: 2017-02-09T15:02:27Z FLINK-5762 Protect initializeState() and open() by the same lock
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user StephanEwen commented on the issue:

          https://github.com/apache/flink/pull/3291

          I think this works for now.

          What would be great is in the long run to actually start the timer service only in/after open() so that timers that are registered cannot fire.

          That way we can make all the state initialization more eager and avoid also all the misleading null errors logged when cancellation happens too early.

          Show
          githubbot ASF GitHub Bot added a comment - Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/3291 I think this works for now. What would be great is in the long run to actually start the timer service only in/after open() so that timers that are registered cannot fire. That way we can make all the state initialization more eager and avoid also all the misleading null errors logged when cancellation happens too early.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user StephanEwen commented on the issue:

          https://github.com/apache/flink/pull/3291

          Merging this...

          Show
          githubbot ASF GitHub Bot added a comment - Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/3291 Merging this...
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user kl0u commented on the issue:

          https://github.com/apache/flink/pull/3291

          Thanks a lot @StephanEwen !

          Show
          githubbot ASF GitHub Bot added a comment - Github user kl0u commented on the issue: https://github.com/apache/flink/pull/3291 Thanks a lot @StephanEwen !
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/flink/pull/3291

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/3291
          Hide
          StephanEwen Stephan Ewen added a comment -

          Fixed via a91b6ff05d8af870ad076f9bf0fc17886787bc46

          Show
          StephanEwen Stephan Ewen added a comment - Fixed via a91b6ff05d8af870ad076f9bf0fc17886787bc46

            People

            • Assignee:
              kkl0u Kostas Kloudas
              Reporter:
              kkl0u Kostas Kloudas
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development