Uploaded image for project: 'Accumulo'
  1. Accumulo
  2. ACCUMULO-1708

Error during minor compaction left tserver in bad state

    Details

    • Type: Bug
    • Status: Open
    • Priority: Critical
    • Resolution: Unresolved
    • Affects Version/s: 1.4.0
    • Fix Version/s: 1.8.2, 2.0.0
    • Component/s: None
    • Labels:
      None

      Description

      A tserver experienced a OOME during minor compaction. This OOME was thrown because java could not create a native thread. Minor compactions only catch declared exceptions and RuntimeExceptions. This left the system in a state where the compaction was not running but the tserver thought it was. This cause"flush -w" to hang and prevented the tserver from reclaiming memory.

      For whatever reason the OOME handler that kills the process did not kick in (seems it only kicks in w/ OOME related to heap allocation).

      1. ThreadTest.java
        3 kB
        Keith Turner

        Issue Links

          Activity

          Hide
          kturner Keith Turner added a comment -

          I propose catching Error (OutOfMemoryException extends this) in minor compaction and halting the process. I am thinking this may be a good patch for 1.4.5, 1.5.1, and 1.6.0. A more general fix may be to make all server threads catch Error and halt the process, could do this for 1.6. Thoughts?

          Show
          kturner Keith Turner added a comment - I propose catching Error (OutOfMemoryException extends this) in minor compaction and halting the process. I am thinking this may be a good patch for 1.4.5, 1.5.1, and 1.6.0. A more general fix may be to make all server threads catch Error and halt the process, could do this for 1.6. Thoughts?
          Hide
          ecn Eric Newton added a comment -

          +1 Sounds good to me.

          Show
          ecn Eric Newton added a comment - +1 Sounds good to me.
          Hide
          kturner Keith Turner added a comment -

          I investigated using thread groups to handle this problem in a general way. Its possible to create a thread group that will handle any uncaught exceptions. By default when a thread creates a new thread, the thread inherits its creators thread group. Therefore threads created in zookeeper and hdfs code would inherit the calling threads thread group. This seemed like a promising approach, however the zookeeper and hdfs code catch Throwable and log it. If this code caught RunTimeException then the Thread group approach would work nicely because Errors would percolate up to the handler in the ThreadGroup.

          DFSClient.DFSOutputStream.DataStreamer
          org.apache.zookeeper.ClientCnxn$EventThread.run()
          org.apache.zookeeper.ClientCnxn$SendThread.run()

          We can make all watchers we pass to zookeeper handle Errors, but it is still possible that zookeeper code executing in a background thread could encounter a OOME. I think I have seen a zookeeper thread die from an OOME before and as a result a tserver that lost its lock did not die. For the threads created by DFSClient I do not think we have any direct control.

          Show
          kturner Keith Turner added a comment - I investigated using thread groups to handle this problem in a general way. Its possible to create a thread group that will handle any uncaught exceptions. By default when a thread creates a new thread, the thread inherits its creators thread group. Therefore threads created in zookeeper and hdfs code would inherit the calling threads thread group. This seemed like a promising approach, however the zookeeper and hdfs code catch Throwable and log it. If this code caught RunTimeException then the Thread group approach would work nicely because Errors would percolate up to the handler in the ThreadGroup. DFSClient.DFSOutputStream.DataStreamer org.apache.zookeeper.ClientCnxn$EventThread.run() org.apache.zookeeper.ClientCnxn$SendThread.run() We can make all watchers we pass to zookeeper handle Errors, but it is still possible that zookeeper code executing in a background thread could encounter a OOME. I think I have seen a zookeeper thread die from an OOME before and as a result a tserver that lost its lock did not die. For the threads created by DFSClient I do not think we have any direct control.
          Hide
          kturner Keith Turner added a comment -

          Some zookeeper issues of interest : ZOOKEEPER-1100 ZOOKEEPER-1442 ZOOKEEPER-1375.

          Show
          kturner Keith Turner added a comment - Some zookeeper issues of interest : ZOOKEEPER-1100 ZOOKEEPER-1442 ZOOKEEPER-1375 .
          Hide
          kturner Keith Turner added a comment -

          Attaching the java program I used to experiment with thread groups and zookeeper threads.

          Show
          kturner Keith Turner added a comment - Attaching the java program I used to experiment with thread groups and zookeeper threads.
          Hide
          vines John Vines added a comment -

          So it sounds like your ThreadGroup approach is a partial solution, but it's the best we got so far. Do we see an overall fix including this approach? If so, I think we should go ahead and see about making this change and document the possibilities that can then occur for future tickets.

          Show
          vines John Vines added a comment - So it sounds like your ThreadGroup approach is a partial solution, but it's the best we got so far. Do we see an overall fix including this approach? If so, I think we should go ahead and see about making this change and document the possibilities that can then occur for future tickets.
          Hide
          elserj Josh Elser added a comment -

          Keith Turner is this still something you are wanting to try to get into 1.6.0? I remember being pointed to some pages which strongly recommended against catching Error. With the introduction of

          -XX:OnOutOfMemoryError="${ACCUMULO_KILL_CMD:-kill -9 %p}"

          in bin/accumulo and the ability for users to define their own ACCUMULO_KILL_CMD in accumulo-env.sh, is this even as big of a worry as previously?

          Show
          elserj Josh Elser added a comment - Keith Turner is this still something you are wanting to try to get into 1.6.0? I remember being pointed to some pages which strongly recommended against catching Error. With the introduction of -XX:OnOutOfMemoryError="${ACCUMULO_KILL_CMD:-kill -9 %p}" in bin/accumulo and the ability for users to define their own ACCUMULO_KILL_CMD in accumulo-env.sh , is this even as big of a worry as previously?
          Hide
          kturner Keith Turner added a comment -

          in bin/accumulo and the ability for users to define their own ACCUMULO_KILL_CMD in accumulo-env.sh, is this even as big of a worry as previously?

          The out of memory kill flag was present, unfortunately this did not kick in. See the last sentence in the ticket description. I think that flag may only kick in for heap allocation errors. However other java code throws OOME. For example when java can not create a new thread it will throw OOME, but this does not seem to trigger -XX:OnOutOfMemoryError.

          I think the best course of action is to modify Accumulo code to catch Error and halt for the threads it creates (maybe use thread groups to do this). For the threads created by zookeeper and HDFS, create follow on accumulo, zookeeper, and hadoop tickets as needed. I am not sure if I will have time to do this for 1.6, I may push it to 1.7.

          Show
          kturner Keith Turner added a comment - in bin/accumulo and the ability for users to define their own ACCUMULO_KILL_CMD in accumulo-env.sh, is this even as big of a worry as previously? The out of memory kill flag was present, unfortunately this did not kick in. See the last sentence in the ticket description. I think that flag may only kick in for heap allocation errors. However other java code throws OOME. For example when java can not create a new thread it will throw OOME, but this does not seem to trigger -XX:OnOutOfMemoryError . I think the best course of action is to modify Accumulo code to catch Error and halt for the threads it creates (maybe use thread groups to do this). For the threads created by zookeeper and HDFS, create follow on accumulo, zookeeper, and hadoop tickets as needed. I am not sure if I will have time to do this for 1.6, I may push it to 1.7.
          Hide
          phrocker marco polo added a comment -

          Perhaps it might behoove us to centralize some of this error handling in an uncaught exception handler so that we can exit when we see fit. I'm not sure It's prudent to immediate exit on an OutOfMemoryError, but we could detect from the MemoryMXBean bean our memory state and make determinations on how to clean up.

          And when I say, "us" I mean you.

          I'm kidding. When I mean you, I actually mean Josh.

          Show
          phrocker marco polo added a comment - Perhaps it might behoove us to centralize some of this error handling in an uncaught exception handler so that we can exit when we see fit. I'm not sure It's prudent to immediate exit on an OutOfMemoryError, but we could detect from the MemoryMXBean bean our memory state and make determinations on how to clean up. And when I say, "us" I mean you. I'm kidding. When I mean you, I actually mean Josh.
          Hide
          elserj Josh Elser added a comment -

          I'm kidding. When I mean you, I actually mean Josh.

          I'm a fan of all things behooved.

          I think the best course of action is to modify Accumulo code to catch Error and halt for the threads it creates (maybe use thread groups to do this)

          If you s/Error/OutOfMemoryError/, I think I'd agree with your approach. I've never seen a JVM that throws an OOME (for heap or native thread reasons) that actually recovered successfully. While I'm not a big fan of a process hanging itself, I think this case it might be a good idea.

          Show
          elserj Josh Elser added a comment - I'm kidding. When I mean you, I actually mean Josh. I'm a fan of all things behooved. I think the best course of action is to modify Accumulo code to catch Error and halt for the threads it creates (maybe use thread groups to do this) If you s/Error/OutOfMemoryError/, I think I'd agree with your approach. I've never seen a JVM that throws an OOME (for heap or native thread reasons) that actually recovered successfully. While I'm not a big fan of a process hanging itself, I think this case it might be a good idea.

            People

            • Assignee:
              Unassigned
              Reporter:
              kturner Keith Turner
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:

                Development