Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-17470

Flink task executor process permanently hangs on `flink-daemon.sh stop`, deletes PID file

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • 1.10.0
    • 1.12.0
    • Runtime / Coordination
    • In Flink 1.12 we changed the behavior of the standalone scripts to issue a SIGKILL if a SIGTERM did not succeed in shutting down a Flink process.

    Description

      Hi Flink team!

      We've attempted to upgrade our flink 1.9 cluster to 1.10, but are experiencing reproducible instability on shutdown. Speciically, it appears that the `kill` issued in the `stop` case of flink-daemon.sh is causing the task executor process to hang permanently. Specifically, the process seems to be hanging in the `org.apache.flink.runtime.util.JvmShutdownSafeguard$DelayedTerminator.run` in a `Thread.sleep()` call. I think this is a bizarre behavior. Also note that every thread in the process is BLOCKED. on a `pthread_cond_wait` call. Is this an OS level issue? Banging my head on a wall here. See attached stack traces for details.

      Attachments

        1. flink_mixed_jstack.log
          155 kB
          Hunter Herman
        2. flink_jstack.log
          244 kB
          Hunter Herman

        Issue Links

          Activity

            People

              rmetzger Robert Metzger
              hherman Hunter Herman
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: