Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-5230 Safety nets against leaving dysfunctional JobManagers
  3. FLINK-5232

Add a Thread default uncaught exception handler on the JobManager

    XMLWordPrintableJSON

Details

    Description

      When some JobManager threads die because of uncaught exceptions, we should bring down the JobManager. If a thread dies from an uncaught exception, there is a high chance that the JobManager becomes dysfunctional.

      The only sfae thing is to rely on the JobManager being restarted by YARN / Mesos / Kubernetes / etc.

      I suggest to add this code to the JobManager launch:

      Thread.setDefaultUncaughtExceptionHandler(new UncaughtExceptionHandler() {
      
          @Override
          public void uncaughtException(Thread t, Throwable e) {
              try {
                  LOG.error("Thread {} died due to an uncaught exception. Killing process.", t.getName());
              } finally {
                  Runtime.getRuntime().halt(-1);
              }
          }
      });
      

      Attachments

        Activity

          People

            yanghua vinoyang
            sewen Stephan Ewen
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: