Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-5141

Memory leak in MonitoredRPCHandlerImpl

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.92.0
    • Fix Version/s: 0.92.0, 0.94.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      I got a pretty reliable way of OOME'ing my region servers. Using a big payload (64MB in my case), a default heap and default number of handlers, it's not too long that all the MonitoredRPCHandlerImpl hold on a 64MB reference and once a compaction kicks in it kills everything.

      The issue is that even after the RPC call is done, the packet still lives in MonitoredRPCHandlerImpl.

      Will attach a screen shot of jprofiler's analysis in a moment.

      This is a blocker for 0.92.0, anyone using a high number of handlers and bigish values will kill themselves.

      1. HBASE-5141.patch
        1 kB
        Jean-Daniel Cryans
      2. HBASE-5141-v2.patch
        1 kB
        Jean-Daniel Cryans
      3. Screen Shot 2012-01-06 at 3.03.09 PM.png
        204 kB
        Jean-Daniel Cryans

        Activity

        Hide
        jdcryans Jean-Daniel Cryans added a comment -

        This screen shot shows how MonitoredRPCHandlerImpl are all using 6% of the heap because they are holding on the packets.

        Show
        jdcryans Jean-Daniel Cryans added a comment - This screen shot shows how MonitoredRPCHandlerImpl are all using 6% of the heap because they are holding on the packets.
        Hide
        tlipcon Todd Lipcon added a comment -

        Part of me thinks this is a configuration error – a conservative admin should always assume that the memory usage of IPC buffers = numHandlers*maxPayload, since you could always have the threads all concurrently handling large calls. So memory should be allocated for this. The fact that the memory is kept around until the next call makes it more likely you'd hit this, but it's still a potential problem regardless.

        I think the proper fix for this problem is to keep an atomic counter for the amount of memory used by IPC handlers, and gate the read calls off the wire based on a memory budget.

        Show
        tlipcon Todd Lipcon added a comment - Part of me thinks this is a configuration error – a conservative admin should always assume that the memory usage of IPC buffers = numHandlers*maxPayload, since you could always have the threads all concurrently handling large calls. So memory should be allocated for this. The fact that the memory is kept around until the next call makes it more likely you'd hit this, but it's still a potential problem regardless. I think the proper fix for this problem is to keep an atomic counter for the amount of memory used by IPC handlers, and gate the read calls off the wire based on a memory budget.
        Hide
        zhihyu@ebaysf.com Ted Yu added a comment -

        One solution is to release the packet before WritableRpcEngine.call() returns ?

                status.setRPCPacket(null);
                return retVal;
        
        Show
        zhihyu@ebaysf.com Ted Yu added a comment - One solution is to release the packet before WritableRpcEngine.call() returns ? status.setRPCPacket(null); return retVal;
        Hide
        jdcryans Jean-Daniel Cryans added a comment -

        I think the proper fix for this problem is to keep an atomic counter for the amount of memory used by IPC handlers, and gate the read calls off the wire based on a memory budget.

        There's more to it, it's not just the handlers that have data but their queues too. By default we add 10 for each handler. I believe this is unrelated to this jira's issue.

        Show
        jdcryans Jean-Daniel Cryans added a comment - I think the proper fix for this problem is to keep an atomic counter for the amount of memory used by IPC handlers, and gate the read calls off the wire based on a memory budget. There's more to it, it's not just the handlers that have data but their queues too. By default we add 10 for each handler. I believe this is unrelated to this jira's issue.
        Hide
        tlipcon Todd Lipcon added a comment -

        Yep, sorry, the memory budgeting would have to be before the data is read into the queue in the IPC server.

        I agree that nulling out the rpc packet will fix the issue as reported here. Just saying that the issue doesn't seem likely to affect most use cases where RPCs are of mixed size and where people have already budgeted their heap to fit a bunch of calls.

        Show
        tlipcon Todd Lipcon added a comment - Yep, sorry, the memory budgeting would have to be before the data is read into the queue in the IPC server. I agree that nulling out the rpc packet will fix the issue as reported here. Just saying that the issue doesn't seem likely to affect most use cases where RPCs are of mixed size and where people have already budgeted their heap to fit a bunch of calls.
        Hide
        jdcryans Jean-Daniel Cryans added a comment -

        Testing this patch, I like it better than passing null as the monitor takes care of itself.

        Show
        jdcryans Jean-Daniel Cryans added a comment - Testing this patch, I like it better than passing null as the monitor takes care of itself.
        Hide
        jdcryans Jean-Daniel Cryans added a comment -

        After much testing, it's actually params not packet that has all that data.

        Show
        jdcryans Jean-Daniel Cryans added a comment - After much testing, it's actually params not packet that has all that data.
        Hide
        jdcryans Jean-Daniel Cryans added a comment -

        This second patch survives my little test. What I was missing was that the packet also contains a reference to the params, so I have to clear out both (that was a bit confusing).

        Show
        jdcryans Jean-Daniel Cryans added a comment - This second patch survives my little test. What I was missing was that the packet also contains a reference to the params, so I have to clear out both (that was a bit confusing).
        Hide
        hadoopqa Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12509797/HBASE-5141-v2.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        -1 javadoc. The javadoc tool appears to have generated -151 warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        -1 findbugs. The patch appears to introduce 79 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests:
        org.apache.hadoop.hbase.mapreduce.TestImportTsv
        org.apache.hadoop.hbase.mapred.TestTableMapReduce
        org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/696//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/696//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/696//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12509797/HBASE-5141-v2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -151 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 79 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.mapreduce.TestImportTsv org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/696//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/696//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/696//console This message is automatically generated.
        Hide
        stack stack added a comment -

        +1 on the patch. Small.

        Show
        stack stack added a comment - +1 on the patch. Small.
        Hide
        stack stack added a comment -

        Let me commit....

        Show
        stack stack added a comment - Let me commit....
        Hide
        stack stack added a comment -

        Committed 0.92 and trunk.

        Show
        stack stack added a comment - Committed 0.92 and trunk.
        Hide
        mikhail Mikhail Bautin added a comment -

        FYI: The build is broken in trunk because of this patch.

        Show
        mikhail Mikhail Bautin added a comment - FYI: The build is broken in trunk because of this patch.
        Hide
        mikhail Mikhail Bautin added a comment -

        I get the error shown at http://pastebin.com/AdAp0M35 when trying to build the following commit:

        Author: stack <stack@13f79535-47bb-0310-9956-ffa450edef68>
        Date: Sat Jan 7 14:16:11 2012

        HBASE-5141 Memory leak in MonitoredRPCHandlerImpl

        git-svn-id: http://svn.apache.org/repos/asf/hbase/trunk@1228740

        Show
        mikhail Mikhail Bautin added a comment - I get the error shown at http://pastebin.com/AdAp0M35 when trying to build the following commit: Author: stack <stack@13f79535-47bb-0310-9956-ffa450edef68> Date: Sat Jan 7 14:16:11 2012 HBASE-5141 Memory leak in MonitoredRPCHandlerImpl git-svn-id: http://svn.apache.org/repos/asf/hbase/trunk@1228740
        Hide
        mikhail Mikhail Bautin added a comment -

        Actually, the committed patch contains more stuff than the patch attached to the JIRA:

        svn diff http://svn.apache.org/repos/asf/hbase/trunk -r1228740

        Was the security version of the patch committed into trunk or something?

        Show
        mikhail Mikhail Bautin added a comment - Actually, the committed patch contains more stuff than the patch attached to the JIRA: svn diff http://svn.apache.org/repos/asf/hbase/trunk -r1228740 Was the security version of the patch committed into trunk or something?
        Hide
        mikhail Mikhail Bautin added a comment -

        Correction: use this svn command:

        svn diff http://svn.apache.org/repos/asf/hbase/trunk -r1228739

        Show
        mikhail Mikhail Bautin added a comment - Correction: use this svn command: svn diff http://svn.apache.org/repos/asf/hbase/trunk -r1228739
        Hide
        stack stack added a comment -

        Fixing....

        Show
        stack stack added a comment - Fixing....
        Hide
        stack stack added a comment -

        Fixed. Sorry about that Mikhail. Thanks for flagging it.

        Show
        stack stack added a comment - Fixed. Sorry about that Mikhail. Thanks for flagging it.
        Hide
        hudson Hudson added a comment -

        Integrated in HBase-0.92 #233 (See https://builds.apache.org/job/HBase-0.92/233/)
        HBASE-5141 Memory leak in MonitoredRPCHandlerImpl

        stack :
        Files :

        • /hbase/branches/0.92/CHANGES.txt
        • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java
        • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/monitoring/MonitoredRPCHandlerImpl.java
        Show
        hudson Hudson added a comment - Integrated in HBase-0.92 #233 (See https://builds.apache.org/job/HBase-0.92/233/ ) HBASE-5141 Memory leak in MonitoredRPCHandlerImpl stack : Files : /hbase/branches/0.92/CHANGES.txt /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/monitoring/MonitoredRPCHandlerImpl.java
        Hide
        hudson Hudson added a comment -

        Integrated in HBase-TRUNK-security #67 (See https://builds.apache.org/job/HBase-TRUNK-security/67/)
        HBASE-5141 Memory leak in MonitoredRPCHandlerImpl – REDO
        HBASE-5141 Memory leak in MonitoredRPCHandlerImpl – REVERT. OVER-COMMITTED. REVERTING ALL SO CAN REDO COMMIT
        HBASE-5141 Memory leak in MonitoredRPCHandlerImpl

        stack :
        Files :

        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/monitoring/MonitoredRPCHandlerImpl.java

        stack :
        Files :

        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ClosedRegionHandler.java
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/monitoring/MonitoredRPCHandlerImpl.java
        • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java

        stack :
        Files :

        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ClosedRegionHandler.java
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/monitoring/MonitoredRPCHandlerImpl.java
        • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java
        Show
        hudson Hudson added a comment - Integrated in HBase-TRUNK-security #67 (See https://builds.apache.org/job/HBase-TRUNK-security/67/ ) HBASE-5141 Memory leak in MonitoredRPCHandlerImpl – REDO HBASE-5141 Memory leak in MonitoredRPCHandlerImpl – REVERT. OVER-COMMITTED. REVERTING ALL SO CAN REDO COMMIT HBASE-5141 Memory leak in MonitoredRPCHandlerImpl stack : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/monitoring/MonitoredRPCHandlerImpl.java stack : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ClosedRegionHandler.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/monitoring/MonitoredRPCHandlerImpl.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java stack : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ClosedRegionHandler.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/monitoring/MonitoredRPCHandlerImpl.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java
        Hide
        hudson Hudson added a comment -

        Integrated in HBase-0.92-security #67 (See https://builds.apache.org/job/HBase-0.92-security/67/)
        HBASE-5141 Memory leak in MonitoredRPCHandlerImpl

        stack :
        Files :

        • /hbase/branches/0.92/CHANGES.txt
        • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java
        • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/monitoring/MonitoredRPCHandlerImpl.java
        Show
        hudson Hudson added a comment - Integrated in HBase-0.92-security #67 (See https://builds.apache.org/job/HBase-0.92-security/67/ ) HBASE-5141 Memory leak in MonitoredRPCHandlerImpl stack : Files : /hbase/branches/0.92/CHANGES.txt /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/monitoring/MonitoredRPCHandlerImpl.java
        Hide
        hudson Hudson added a comment -

        Integrated in HBase-TRUNK #2617 (See https://builds.apache.org/job/HBase-TRUNK/2617/)
        HBASE-5141 Memory leak in MonitoredRPCHandlerImpl – REDO
        HBASE-5141 Memory leak in MonitoredRPCHandlerImpl – REVERT. OVER-COMMITTED. REVERTING ALL SO CAN REDO COMMIT
        HBASE-5141 Memory leak in MonitoredRPCHandlerImpl

        stack :
        Files :

        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/monitoring/MonitoredRPCHandlerImpl.java

        stack :
        Files :

        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ClosedRegionHandler.java
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/monitoring/MonitoredRPCHandlerImpl.java
        • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java

        stack :
        Files :

        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ClosedRegionHandler.java
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/monitoring/MonitoredRPCHandlerImpl.java
        • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java
        Show
        hudson Hudson added a comment - Integrated in HBase-TRUNK #2617 (See https://builds.apache.org/job/HBase-TRUNK/2617/ ) HBASE-5141 Memory leak in MonitoredRPCHandlerImpl – REDO HBASE-5141 Memory leak in MonitoredRPCHandlerImpl – REVERT. OVER-COMMITTED. REVERTING ALL SO CAN REDO COMMIT HBASE-5141 Memory leak in MonitoredRPCHandlerImpl stack : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/monitoring/MonitoredRPCHandlerImpl.java stack : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ClosedRegionHandler.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/monitoring/MonitoredRPCHandlerImpl.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java stack : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ClosedRegionHandler.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/monitoring/MonitoredRPCHandlerImpl.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java

          People

          • Assignee:
            jdcryans Jean-Daniel Cryans
            Reporter:
            jdcryans Jean-Daniel Cryans
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development