Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-5745

Set uncaught exception handler for Netty threads

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.3.0
    • Component/s: Network
    • Labels:
      None

      Description

      We pass a thread factory for the Netty event loop threads (see NettyServer and NettyClient), but don't set an uncaught exception handler. Let's add a JVM terminating handler that exits the process in cause of fatal errors.

        Issue Links

          Activity

          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user NicoK opened a pull request:

          https://github.com/apache/flink/pull/3293

          FLINK-5745 set an uncaught exception handler for netty threads

          This adds a JVM-terminating handler that logs errors from uncaught exceptions
          and terminates the process so that critical exceptions are not accidentally
          lost and leave the system running in an inconsistent state.

          It borrows and re-uses code from @StephanEwen from this PR:
          https://github.com/apache/flink/pull/3290

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/NicoK/flink flink-5745

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/flink/pull/3293.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #3293


          commit b2a11267b76c9202e178a315905f4e06efd00e34
          Author: Nico Kruber <nico@data-artisans.com>
          Date: 2017-02-10T11:04:37Z

          FLINK-5745 set an uncaught exception handler for netty threads

          This adds a JVM-terminating handler that logs errors from uncaught exceptions
          and terminates the process so that critical exceptions are not accidentally
          lost and leave the system running in an inconsistent state.

          It borrows and re-uses code from @StephanEwen from this PR:
          https://github.com/apache/flink/pull/3290


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user NicoK opened a pull request: https://github.com/apache/flink/pull/3293 FLINK-5745 set an uncaught exception handler for netty threads This adds a JVM-terminating handler that logs errors from uncaught exceptions and terminates the process so that critical exceptions are not accidentally lost and leave the system running in an inconsistent state. It borrows and re-uses code from @StephanEwen from this PR: https://github.com/apache/flink/pull/3290 You can merge this pull request into a Git repository by running: $ git pull https://github.com/NicoK/flink flink-5745 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3293.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3293 commit b2a11267b76c9202e178a315905f4e06efd00e34 Author: Nico Kruber <nico@data-artisans.com> Date: 2017-02-10T11:04:37Z FLINK-5745 set an uncaught exception handler for netty threads This adds a JVM-terminating handler that logs errors from uncaught exceptions and terminates the process so that critical exceptions are not accidentally lost and leave the system running in an inconsistent state. It borrows and re-uses code from @StephanEwen from this PR: https://github.com/apache/flink/pull/3290
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user uce commented on the issue:

          https://github.com/apache/flink/pull/3293

          Should we instead of copying the uncaught exception handler code move it out to an outer class? You would have to base this PR on #3290, move the class out and only use it here.

          Show
          githubbot ASF GitHub Bot added a comment - Github user uce commented on the issue: https://github.com/apache/flink/pull/3293 Should we instead of copying the uncaught exception handler code move it out to an outer class? You would have to base this PR on #3290, move the class out and only use it here.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user NicoK commented on the issue:

          https://github.com/apache/flink/pull/3293

          would be a different LOG handler though - does it make sense to have two or is it enough to have a single one in an outer class?

          Show
          githubbot ASF GitHub Bot added a comment - Github user NicoK commented on the issue: https://github.com/apache/flink/pull/3293 would be a different LOG handler though - does it make sense to have two or is it enough to have a single one in an outer class?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user uce commented on the issue:

          https://github.com/apache/flink/pull/3293

          I think it's logged as `FatalExitExceptionHandler` in both places right now as well (Stephan's PR and your PR). So nothing would change if we move it out, or am I overlooking something?

          Show
          githubbot ASF GitHub Bot added a comment - Github user uce commented on the issue: https://github.com/apache/flink/pull/3293 I think it's logged as `FatalExitExceptionHandler` in both places right now as well (Stephan's PR and your PR). So nothing would change if we move it out, or am I overlooking something?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user NicoK commented on the issue:

          https://github.com/apache/flink/pull/3293

          wouldn't it be `NettyServer$FatalExitExceptionHandler` vs. `ExecutorThreadFactory$FatalExitExceptionHandler`?

          Show
          githubbot ASF GitHub Bot added a comment - Github user NicoK commented on the issue: https://github.com/apache/flink/pull/3293 wouldn't it be `NettyServer$FatalExitExceptionHandler` vs. `ExecutorThreadFactory$FatalExitExceptionHandler`?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user uce commented on the issue:

          https://github.com/apache/flink/pull/3293

          I thought that this is only going to be one, sorry. Imo only one is good. The thread name is logged in the message.

          Show
          githubbot ASF GitHub Bot added a comment - Github user uce commented on the issue: https://github.com/apache/flink/pull/3293 I thought that this is only going to be one, sorry. Imo only one is good. The thread name is logged in the message.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user StephanEwen commented on the issue:

          https://github.com/apache/flink/pull/3293

          Looking at this and at my previous pull request, I am wondering if we should actually define and collect exit codes somewhere globally, like in `flink-core:org.apache.flink.configuration.ExitCodes`.

          Show
          githubbot ASF GitHub Bot added a comment - Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/3293 Looking at this and at my previous pull request, I am wondering if we should actually define and collect exit codes somewhere globally, like in `flink-core:org.apache.flink.configuration.ExitCodes`.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user NicoK commented on the issue:

          https://github.com/apache/flink/pull/3293

          I was actually looking through the code to find something like this but it seems that every class does this locally for now. Global exit codes make sense though - also for documentation

          Show
          githubbot ASF GitHub Bot added a comment - Github user NicoK commented on the issue: https://github.com/apache/flink/pull/3293 I was actually looking through the code to find something like this but it seems that every class does this locally for now. Global exit codes make sense though - also for documentation
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user NicoK commented on the issue:

          https://github.com/apache/flink/pull/3293

          @uce I'll extract the inner class and use it here as well as soon as the final #3290 is merged

          Show
          githubbot ASF GitHub Bot added a comment - Github user NicoK commented on the issue: https://github.com/apache/flink/pull/3293 @uce I'll extract the inner class and use it here as well as soon as the final #3290 is merged
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user StephanEwen commented on the issue:

          https://github.com/apache/flink/pull/3293

          #3290 is in

          Show
          githubbot ASF GitHub Bot added a comment - Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/3293 #3290 is in
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user StephanEwen commented on the issue:

          https://github.com/apache/flink/pull/3293

          @NicoK I think you can update this now...

          Show
          githubbot ASF GitHub Bot added a comment - Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/3293 @NicoK I think you can update this now...
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user NicoK commented on the issue:

          https://github.com/apache/flink/pull/3293

          @StephanEwen already did when #3290 got in

          Show
          githubbot ASF GitHub Bot added a comment - Github user NicoK commented on the issue: https://github.com/apache/flink/pull/3293 @StephanEwen already did when #3290 got in
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user uce commented on the issue:

          https://github.com/apache/flink/pull/3293

          This looks good to me now. Nico moved the `FatalExitExceptionHandler` out of `ExecutorThreadFactory`. I'm going to merge this later today if @StephanEwen has no objections.

          Show
          githubbot ASF GitHub Bot added a comment - Github user uce commented on the issue: https://github.com/apache/flink/pull/3293 This looks good to me now. Nico moved the `FatalExitExceptionHandler` out of `ExecutorThreadFactory`. I'm going to merge this later today if @StephanEwen has no objections.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user StephanEwen commented on the issue:

          https://github.com/apache/flink/pull/3293

          +1 to merge this

          Show
          githubbot ASF GitHub Bot added a comment - Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/3293 +1 to merge this
          Hide
          uce Ufuk Celebi added a comment -

          Fixed in 9b4cd34 (release-1.2).

          Show
          uce Ufuk Celebi added a comment - Fixed in 9b4cd34 (release-1.2).
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/flink/pull/3293

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/3293

            People

            • Assignee:
              Unassigned
              Reporter:
              uce Ufuk Celebi
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development