ZooKeeper
  1. ZooKeeper
  2. ZOOKEEPER-1442

Uncaught exception handler should exit on a java.lang.Error

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: 3.4.3, 3.3.5
    • Fix Version/s: None
    • Component/s: java client, server
    • Labels:
      None

      Description

      The uncaught exception handler registered in NIOServerCnxnFactory and ClientCnxn simply logs exceptions and lets the rest of ZooKeeper go on its merry way. However, errors such as OutOfMemoryErrors should really crash the program, as they represent unrecoverable errors. If the exception that gets to the uncaught exception handler is an instanceof a java.lang.Error, ZK should exit with an error code (in addition to logging the error).

      1. ZOOKEEPER-1442.patch
        1 kB
        Jeremy Stribling
      2. ZOOKEEPER-1442.patch
        20 kB
        Jeremy Stribling
      3. ZOOKEEPER-1442.patch
        29 kB
        Jeremy Stribling

        Activity

        Hide
        Jeremy Stribling added a comment -

        It looks nice, I just have one small comment about reflection usage. I don't see any logic to use reflection until not supporting new user defined custom strategies, instead can we make it simple by putting the instance directly to the HashMap?

        I was just trying to avoid the unnecessary object construction, though admittedly it probably doesn't save much work. I could take out the reflection if others agree it's over the top.

        If I took it out, I would probably make name() be a member of the interface, and then just iterate over an array of constructed instances to find the match.

        Show
        Jeremy Stribling added a comment - It looks nice, I just have one small comment about reflection usage. I don't see any logic to use reflection until not supporting new user defined custom strategies, instead can we make it simple by putting the instance directly to the HashMap? I was just trying to avoid the unnecessary object construction, though admittedly it probably doesn't save much work. I could take out the reflection if others agree it's over the top. If I took it out, I would probably make name() be a member of the interface, and then just iterate over an array of constructed instances to find the match.
        Hide
        Rakesh R added a comment -

        It looks nice, I just have one small comment about reflection usage. I don't see any logic to use reflection until not supporting new user defined custom strategies, instead can we make it simple by putting the instance directly to the HashMap?
        like,

                handlerMap.put(LogOnlyExceptionHandler.name().toLowerCase(),
                               new LogOnlyExceptionHandler());
        
        Show
        Rakesh R added a comment - It looks nice, I just have one small comment about reflection usage. I don't see any logic to use reflection until not supporting new user defined custom strategies, instead can we make it simple by putting the instance directly to the HashMap? like, handlerMap.put(LogOnlyExceptionHandler.name().toLowerCase(), new LogOnlyExceptionHandler());
        Hide
        Jeremy Stribling added a comment -

        Here's another try addressing the review comments. I also added code to the Netty factory to install the specified exception handler. I'd appreciate another review.

        Show
        Jeremy Stribling added a comment - Here's another try addressing the review comments. I also added code to the Netty factory to install the specified exception handler. I'd appreciate another review.
        Hide
        Patrick Hunt added a comment -

        I thought from the email discussion that the default behavior shouldn't change until the next major release (e.g., 4.0).

        I missed that discussion, sorry. As long as we maintain backward compatibility we can make such a change in a "minor" release. 3.4 to 3.5 say. And in this case we are maintaining the b/w compat - you can configure the server as you wish. Additionally there are other cases where the server will exit, and you have to handle that (see the supervisor section of the admin guide). So really we're not changing the general behavior of the server here.

        @rakesh - there's no need to have the type if you go to a strategy class approach, really in that case you don't want to have such.

        Show
        Patrick Hunt added a comment - I thought from the email discussion that the default behavior shouldn't change until the next major release (e.g., 4.0). I missed that discussion, sorry. As long as we maintain backward compatibility we can make such a change in a "minor" release. 3.4 to 3.5 say. And in this case we are maintaining the b/w compat - you can configure the server as you wish. Additionally there are other cases where the server will exit, and you have to handle that (see the supervisor section of the admin guide). So really we're not changing the general behavior of the server here. @rakesh - there's no need to have the type if you go to a strategy class approach, really in that case you don't want to have such.
        Hide
        Rakesh R added a comment -

        Sorry to jump in late here. Its really a great patch and following are just some thoughts from mine,

        1) It would be good moving the literals "logOnly", "logAndExitOnError", "none" to the enum DefaultUncaughtExceptionStrategyType and expose a method getType() or type(), so will avoid hardcoding in multiple places.

        2) I agree, presently there is no exception handlers for NettyServer, but I feel would be great provide these strategies to this also. what others feel?

        Show
        Rakesh R added a comment - Sorry to jump in late here. Its really a great patch and following are just some thoughts from mine, 1) It would be good moving the literals "logOnly", "logAndExitOnError", "none" to the enum DefaultUncaughtExceptionStrategyType and expose a method getType() or type(), so will avoid hardcoding in multiple places. 2) I agree, presently there is no exception handlers for NettyServer, but I feel would be great provide these strategies to this also. what others feel?
        Hide
        Jeremy Stribling added a comment -

        Thanks for the reviews!

        1) indeed the line length formatting needs to be fixed, 80 char max (some wiggle room but generally 80 is max)

        Ok great, that's what I prefer too but I saw a lot of wiggling in other files and thought there might be a preference for long lines. Will fix.

        2) the default should be log and exit for 3.5. not exiting is really a bug, and since it's configurable I think it's fine to be b/w compatible when it comes to the new release. (so you'd have two patches that are slightly different, that's ok). I would mark this jira as "incompatible" and add a release note to this jira for inclusion in the release notes of the effected releases

        I thought from the email discussion that the default behavior shouldn't change until the next major release (e.g., 4.0). I was planning to create a reminder JIRA for 4.0 that we should change the default to "log and exit". More than happy to change the default, but I just want to make sure everyone's on the same page with that.

        3) this is a bigger comment. You shouldn't be using constants to mark "logonly" "logandexit" etc.... Rather you should be configuring which class to handle the strategy.

        a) Define an interface for this strategy
        b) Implement three default classes to be available for use in the config file. (log/logexit/none)
        c) document this (intf and concrete classes) in the guide

        the nice thing about this approach is say that someone comes along and needs a fourth option. Say it's osgi specific, or "page me", etc... They can implement their own class and then just specify that strategy in the config file. It will be much more adaptable going forward.

        That sounds good. I'll work on that when I get some time.

        Show
        Jeremy Stribling added a comment - Thanks for the reviews! 1) indeed the line length formatting needs to be fixed, 80 char max (some wiggle room but generally 80 is max) Ok great, that's what I prefer too but I saw a lot of wiggling in other files and thought there might be a preference for long lines. Will fix. 2) the default should be log and exit for 3.5. not exiting is really a bug, and since it's configurable I think it's fine to be b/w compatible when it comes to the new release. (so you'd have two patches that are slightly different, that's ok). I would mark this jira as "incompatible" and add a release note to this jira for inclusion in the release notes of the effected releases I thought from the email discussion that the default behavior shouldn't change until the next major release (e.g., 4.0). I was planning to create a reminder JIRA for 4.0 that we should change the default to "log and exit". More than happy to change the default, but I just want to make sure everyone's on the same page with that. 3) this is a bigger comment. You shouldn't be using constants to mark "logonly" "logandexit" etc.... Rather you should be configuring which class to handle the strategy. a) Define an interface for this strategy b) Implement three default classes to be available for use in the config file. (log/logexit/none) c) document this (intf and concrete classes) in the guide the nice thing about this approach is say that someone comes along and needs a fourth option. Say it's osgi specific, or "page me", etc... They can implement their own class and then just specify that strategy in the config file. It will be much more adaptable going forward. That sounds good. I'll work on that when I get some time.
        Hide
        Patrick Hunt added a comment -

        This is a great idea and a very nice patch. Thanks! However I do have a few comments.

        1) indeed the line length formatting needs to be fixed, 80 char max (some wiggle room but generally 80 is max)

        2) the default should be log and exit for 3.5. not exiting is really a bug, and since it's configurable I think it's fine to be b/w compatible when it comes to the new release. (so you'd have two patches that are slightly different, that's ok). I would mark this jira as "incompatible" and add a release note to this jira for inclusion in the release notes of the effected releases

        3) this is a bigger comment. You shouldn't be using constants to mark "logonly" "logandexit" etc.... Rather you should be configuring which class to handle the strategy.

        a) Define an interface for this strategy
        b) Implement three default classes to be available for use in the config file. (log/logexit/none)
        c) document this (intf and concrete classes) in the guide

        the nice thing about this approach is say that someone comes along and needs a fourth option. Say it's osgi specific, or "page me", etc... They can implement their own class and then just specify that strategy in the config file. It will be much more adaptable going forward.

        Show
        Patrick Hunt added a comment - This is a great idea and a very nice patch. Thanks! However I do have a few comments. 1) indeed the line length formatting needs to be fixed, 80 char max (some wiggle room but generally 80 is max) 2) the default should be log and exit for 3.5. not exiting is really a bug, and since it's configurable I think it's fine to be b/w compatible when it comes to the new release. (so you'd have two patches that are slightly different, that's ok). I would mark this jira as "incompatible" and add a release note to this jira for inclusion in the release notes of the effected releases 3) this is a bigger comment. You shouldn't be using constants to mark "logonly" "logandexit" etc.... Rather you should be configuring which class to handle the strategy. a) Define an interface for this strategy b) Implement three default classes to be available for use in the config file. (log/logexit/none) c) document this (intf and concrete classes) in the guide the nice thing about this approach is say that someone comes along and needs a fourth option. Say it's osgi specific, or "page me", etc... They can implement their own class and then just specify that strategy in the config file. It will be much more adaptable going forward.
        Hide
        Michi Mutsuzaki added a comment -

        +1.

        The patch looks good to me. I'd like to wait for +1 from another committer before checking this in.

        Feel free to hammer me on style. I erred on the side of verbosity with variable names, but couldn't find anything stating the preferred line length so sometimes lines get a bit long.

        Yeah I guess we don't enforce line length with tools like checkstyle. Not sure if we have an official coding style guide.

        Thanks!
        --Michi

        Show
        Michi Mutsuzaki added a comment - +1. The patch looks good to me. I'd like to wait for +1 from another committer before checking this in. Feel free to hammer me on style. I erred on the side of verbosity with variable names, but couldn't find anything stating the preferred line length so sometimes lines get a bit long. Yeah I guess we don't enforce line length with tools like checkstyle. Not sure if we have an official coding style guide. Thanks! --Michi
        Hide
        Jeremy Stribling added a comment -

        Here's an attempt to make the default uncaught exception strategy parameterized. I'd appreciate a review.

        It changes the system behavior slightly by setting the handler only when an NIOServerCnxnFactory is constructed, instead of as static class code. That means it will overwrite the handler for each new factory, if the strategy is not "none".

        Feel free to hammer me on style. I erred on the side of verbosity with variable names, but couldn't find anything stating the preferred line length so sometimes lines get a bit long.

        Show
        Jeremy Stribling added a comment - Here's an attempt to make the default uncaught exception strategy parameterized. I'd appreciate a review. It changes the system behavior slightly by setting the handler only when an NIOServerCnxnFactory is constructed, instead of as static class code. That means it will overwrite the handler for each new factory, if the strategy is not "none". Feel free to hammer me on style. I erred on the side of verbosity with variable names, but couldn't find anything stating the preferred line length so sometimes lines get a bit long.
        Hide
        Jeremy Stribling added a comment -

        Thanks Michi, I will give it a try.

        Show
        Jeremy Stribling added a comment - Thanks Michi, I will give it a try.
        Hide
        Michi Mutsuzaki added a comment -

        Hi Jeremy,

        • ZooKeeperServerMain creates a ServerConfig, which in turn creates QuorumPeerConfig that parses a config file. You can modify QuorumPeerConfig.parseProperties() to include the new flag.
        • Then, initialize the flag in ServerConfig.readFrom().
        • Coming back to ZooKeeperServerMain, runFromConfig() calls ServerCnxnFactory.configure(). You can probably add a new parameter here to pass the flag to ServerCnxnFactory.
        • You might need to move the Thread.setDefaultUncaughtExceptionHandler() call in NIOServerCnxnFactory since it doesn't have the flag until configure() is called.
        • Finally, include the flag in ZooKeeperServer.dumpConf() so that it gets printed with "conf" 4-letter word command.

        --Michi

        Show
        Michi Mutsuzaki added a comment - Hi Jeremy, ZooKeeperServerMain creates a ServerConfig, which in turn creates QuorumPeerConfig that parses a config file. You can modify QuorumPeerConfig.parseProperties() to include the new flag. Then, initialize the flag in ServerConfig.readFrom(). Coming back to ZooKeeperServerMain, runFromConfig() calls ServerCnxnFactory.configure(). You can probably add a new parameter here to pass the flag to ServerCnxnFactory. You might need to move the Thread.setDefaultUncaughtExceptionHandler() call in NIOServerCnxnFactory since it doesn't have the flag until configure() is called. Finally, include the flag in ZooKeeperServer.dumpConf() so that it gets printed with "conf" 4-letter word command. --Michi
        Hide
        Jeremy Stribling added a comment -

        I don't have a need for it on the client side. In general I think the user of either library should be able to override the default uncaught exception handler if they want to, but that could be future work. I can drop the client side out of this patch.

        However, I don't have any experience adding flags to ZK – can someone point me at a good example I can follow?

        Show
        Jeremy Stribling added a comment - I don't have a need for it on the client side. In general I think the user of either library should be able to override the default uncaught exception handler if they want to, but that could be future work. I can drop the client side out of this patch. However, I don't have any experience adding flags to ZK – can someone point me at a good example I can follow?
        Hide
        Camille Fournier added a comment -

        The overwhelming feedback on the ML was the desire for a flag to turn this on or off... I'm a bit torn but can we see a patch with that option? Also, do you really need this in the client side, or is it best to just remove the uncaught exception handler completely from that part of the code?

        Show
        Camille Fournier added a comment - The overwhelming feedback on the ML was the desire for a flag to turn this on or off... I'm a bit torn but can we see a patch with that option? Also, do you really need this in the client side, or is it best to just remove the uncaught exception handler completely from that part of the code?
        Hide
        Jeremy Stribling added a comment -

        I'm happy to update the patch to avoid exiting if the error is an instance of ThreadDeath. Seems reasonable to me.

        Show
        Jeremy Stribling added a comment - I'm happy to update the patch to avoid exiting if the error is an instance of ThreadDeath. Seems reasonable to me.
        Hide
        Henry Robinson added a comment -

        The current method of logging the error seems problematic - if the exception really is OOME then the string passed to LOG will perhaps fail to be allocated (since it can't be interned because of the concatenation).

        It doesn't seem correct to exit on ThreadDeath either, by a cursory reading of the documentation.

        The advice on java.lang.error appears to be that we should not be trying to catch it at all. That said, I'm not against failing fast because I've seen java server processes go OOM and then just keep ticking along like a zombie, causing strange bugs when some operations appear to take effect and some don't.

        Show
        Henry Robinson added a comment - The current method of logging the error seems problematic - if the exception really is OOME then the string passed to LOG will perhaps fail to be allocated (since it can't be interned because of the concatenation). It doesn't seem correct to exit on ThreadDeath either, by a cursory reading of the documentation. The advice on java.lang.error appears to be that we should not be trying to catch it at all. That said, I'm not against failing fast because I've seen java server processes go OOM and then just keep ticking along like a zombie, causing strange bugs when some operations appear to take effect and some don't.
        Hide
        Camille Fournier added a comment -

        My biggest question mark is around exiting on ThreadDeath, and I'd like to get a bit of community feedback before committing. But if I can get some color around those concerns I'm ok with the patch.

        Show
        Camille Fournier added a comment - My biggest question mark is around exiting on ThreadDeath, and I'd like to get a bit of community feedback before committing. But if I can get some color around those concerns I'm ok with the patch.
        Hide
        Jeremy Stribling added a comment -

        Regarding the findbugs warnings: any exception thrown from the handler will be ignored, so throwing a RuntimeException isn't really an option. See http://docs.oracle.com/javase/6/docs/api/java/lang/Thread.UncaughtExceptionHandler.html#uncaughtException%28java.lang.Thread,%20java.lang.Throwable%29 .

        Show
        Jeremy Stribling added a comment - Regarding the findbugs warnings: any exception thrown from the handler will be ignored, so throwing a RuntimeException isn't really an option. See http://docs.oracle.com/javase/6/docs/api/java/lang/Thread.UncaughtExceptionHandler.html#uncaughtException%28java.lang.Thread,%20java.lang.Throwable%29 .
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12522293/ZOOKEEPER-1442.patch
        against trunk revision 1307644.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        -1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1031//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1031//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1031//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12522293/ZOOKEEPER-1442.patch against trunk revision 1307644. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1031//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1031//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1031//console This message is automatically generated.
        Hide
        Jeremy Stribling added a comment -

        Fixed patch base directory. Anyone care to code review?

        Show
        Jeremy Stribling added a comment - Fixed patch base directory. Anyone care to code review?
        Hide
        Jeremy Stribling added a comment -

        No test, but not sure one is needed.

        Show
        Jeremy Stribling added a comment - No test, but not sure one is needed.
        Hide
        Jeremy Stribling added a comment -

        I have a patch ready. I can't figure out how to add a test that will verify that the system calls System.exit() correctly when an Error is thrown – I don't think the test harness is set up for that. Anyone have any pointers?

        Show
        Jeremy Stribling added a comment - I have a patch ready. I can't figure out how to add a test that will verify that the system calls System.exit() correctly when an Error is thrown – I don't think the test harness is set up for that. Anyone have any pointers?

          People

          • Assignee:
            Jeremy Stribling
            Reporter:
            Jeremy Stribling
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:

              Development