Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-17470

Flink task executor process permanently hangs on `flink-daemon.sh stop`, deletes PID file

Agile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • 1.10.0
    • 1.12.0
    • Runtime / Coordination
    • In Flink 1.12 we changed the behavior of the standalone scripts to issue a SIGKILL if a SIGTERM did not succeed in shutting down a Flink process.

    Description

      Hi Flink team!

      We've attempted to upgrade our flink 1.9 cluster to 1.10, but are experiencing reproducible instability on shutdown. Speciically, it appears that the `kill` issued in the `stop` case of flink-daemon.sh is causing the task executor process to hang permanently. Specifically, the process seems to be hanging in the `org.apache.flink.runtime.util.JvmShutdownSafeguard$DelayedTerminator.run` in a `Thread.sleep()` call. I think this is a bizarre behavior. Also note that every thread in the process is BLOCKED. on a `pthread_cond_wait` call. Is this an OS level issue? Banging my head on a wall here. See attached stack traces for details.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            rmetzger Robert Metzger
            hherman Hunter Herman
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment