Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-19154

Application mode deletes HA data in case of suspended ZooKeeper connection

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • 1.11.1, 1.12.0
    • 1.11.3, 1.12.0
    • Run a stand-alone cluster that runs a single job (if you are familiar with the way Ververica Platform runs Flink jobs, we use a very similar approach). It runs Flink 1.11.1 straight from the official docker image.

    Description

      A user reported that Flink's application mode deletes HA data in case of a suspended ZooKeeper connection [1].

      The problem seems to be that the ApplicationDispatcherBootstrap class produces an exception (that the request job can no longer be found because of a lost ZooKeeper connection) which will be interpreted as a job failure. Due to this interpretation, the cluster will be shut down with a terminal state of FAILED which will cause the HA data to be cleaned up. The exact problem occurs in the JobStatusPollingUtils.getJobResult which is called by ApplicationDispatcherBootstrap.getJobResult().

      The above described behaviour can be found in this log [2].

      [1] http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Checkpoint-metadata-deleted-by-Flink-after-ZK-connection-issues-td37937.html
      [2] https://pastebin.com/raw/uH9KDU2L

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            kkl0u Kostas Kloudas
            568793005@qq.com Husky Zeng
            Votes:
            1 Vote for this issue
            Watchers:
            12 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment