Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-20261

Uncaught exception in ExecutorNotifier due to split assignment broken by failed task

    XMLWordPrintableJSON

Details

    Description

      While trying to extend FileSourceTextLinesITCase::testContinuousTextFileSourceWithTaskManagerFailover with recovery test after TM failure (TestingMiniCluster::terminateTaskExecutor, branch) in FLINK-20118, I encountered the following case:

      • SourceCoordinatorContext::assignSplits schedules async assignment (all reader tasks alive)
      • call TestingMiniCluster::terminateTaskExecutor while doing writeFile in a loop of testContinuousTextFileSource
      • causes graceful TaskExecutor::onStop shutdown
      • causes TM/RM disconnect and failing slot allocations in JM by RM
      • eventually causes SourceCoordinatorContext::unregisterSourceReader
      • actual assignment starts (SourceCoordinatorContext::assignSplits: callInCoordinatorThread)
      • registeredReaders.containsKey(subtaskId) check fails (due to failed task) with IllegalArgumentException which is uncaught in single thread executor
      • forces ThreadPool to recreate the single thread
      • calls CoordinatorExecutorThreadFactory::newThread
      • fails expected condition of single thread creation with IllegalStateException which is uncaught
      • calls FatalExitExceptionHandler and exits JVM abruptly
      [SourceCoordinator-Source: file-source] ERROR org.apache.flink.runtime.util.FatalExitExceptionHandler - FATAL: Thread 'SourceCoordinator-Source: file-source' produced an uncaught exception. Stopping the process...
      java.lang.IllegalStateException: Should never happen. This factory should only be used by a SingleThreadExecutor.
      	at org.apache.flink.runtime.source.coordinator.SourceCoordinatorProvider$CoordinatorExecutorThreadFactory.newThread(SourceCoordinatorProvider.java:94) ~[classes/:?]
      	at java.util.concurrent.ThreadPoolExecutor$Worker.<init>(ThreadPoolExecutor.java:619) ~[?:1.8.0_172]
      	at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:932) ~[?:1.8.0_172]
      	at java.util.concurrent.ThreadPoolExecutor.processWorkerExit(ThreadPoolExecutor.java:1025) ~[?:1.8.0_172]
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167) ~[?:1.8.0_172]
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_172]
      	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_172]
      
      Process finished with exit code 239
      

      Attachments

        Issue Links

          Activity

            People

              sewen Stephan Ewen
              azagrebin Andrey Zagrebin
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: