HBase
  1. HBase
  2. HBASE-3906

When HMaster is running,there are a lot of RegionLoad instances(far greater than the regions),it has risk of OOME.

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.90.2, 0.90.3
    • Fix Version/s: 0.90.4
    • Component/s: master
    • Labels:
      None
    • Environment:

      1 hmaster,4 regionserver,about 100,000 regions.

    • Hadoop Flags:
      Reviewed

      Description

      1、Start hbase cluster;
      2、After hmaster finish regions assignement,use jmap to dump the memory of hmaster;
      3、Use MAT to analyse the dump file,there are too many RegionLoad instances,and these instances occupy more than 3G memory;

      1. HBASE-3906.patch
        3 kB
        jian zhang
      2. HBASE-3906.patch
        0.9 kB
        jian zhang

        Activity

        Hide
        Ted Yu added a comment -

        The patch wouldn't apply to trunk where heart beat has been removed.

        Show
        Ted Yu added a comment - The patch wouldn't apply to trunk where heart beat has been removed.
        Hide
        stack added a comment -

        @Ted I think the patch is for branch only. It has the problem. I don't believe TRUNK does.

        @Jian This should work though its ugly; i.e. refreshing an HServerInfo instance (Do we need to keep load in the Map of regions? What about clearing the load from the HSI we add to the Map of regions to HSI? Would that work? Or is this Map used balancing?). Does your patch work for you? Any issues w/ the new synchronize blocks?

        Show
        stack added a comment - @Ted I think the patch is for branch only. It has the problem. I don't believe TRUNK does. @Jian This should work though its ugly; i.e. refreshing an HServerInfo instance (Do we need to keep load in the Map of regions? What about clearing the load from the HSI we add to the Map of regions to HSI? Would that work? Or is this Map used balancing?). Does your patch work for you? Any issues w/ the new synchronize blocks?
        Hide
        Andrew Purtell added a comment -

        How many of those "3G" of objects on the heap are live?

        Show
        Andrew Purtell added a comment - How many of those "3G" of objects on the heap are live?
        Hide
        jian zhang added a comment -

        1, Ted, This patch is only for branch.
        2, Andrew, In my hmaster dump, there are 1481 HServerInfo and HServerLoad objects,24,423,058 RegionLoad objects,one RegionLoad occupy 136B.I'm not native speaker,so i'm not very sure that i understand your question correctly. Can i understand "live objects" as the objects which cann't be garbage collected by jvm?if so,i think all these objects are live.
        3,stack, I tested serveral senarios,the patch can work correctly,no issues found about synchronize blocks.
        Indeed,refreshing hserverinfo is not grace enough. and balancing don't use the load of the HSI in regions map i think. according to your suggestion, i cleared the load and patched on my cluster to test,until now,it works ok.i will try to test more senarios and then provide the new patch to you for reviewing again.
        BTW,one hserverinfo object occupy about 350B memory though cleared the load,if we don't use my ugly refreshing solution, in worst case,one region need one hserverinfo object,if a big hbase cluster have 500,000 regions,the hserverinfo objects will occupy about 175,000,000B memory.do you think this can be acceptable?

        Show
        jian zhang added a comment - 1, Ted, This patch is only for branch. 2, Andrew, In my hmaster dump, there are 1481 HServerInfo and HServerLoad objects,24,423,058 RegionLoad objects,one RegionLoad occupy 136B.I'm not native speaker,so i'm not very sure that i understand your question correctly. Can i understand "live objects" as the objects which cann't be garbage collected by jvm?if so,i think all these objects are live. 3,stack, I tested serveral senarios,the patch can work correctly,no issues found about synchronize blocks. Indeed,refreshing hserverinfo is not grace enough. and balancing don't use the load of the HSI in regions map i think. according to your suggestion, i cleared the load and patched on my cluster to test,until now,it works ok.i will try to test more senarios and then provide the new patch to you for reviewing again. BTW,one hserverinfo object occupy about 350B memory though cleared the load,if we don't use my ugly refreshing solution, in worst case,one region need one hserverinfo object,if a big hbase cluster have 500,000 regions,the hserverinfo objects will occupy about 175,000,000B memory.do you think this can be acceptable?
        Hide
        stack added a comment -

        Jian: Thanks for trying out my suggestion. I think 175M is fine if you have 500k regions.

        Show
        stack added a comment - Jian: Thanks for trying out my suggestion. I think 175M is fine if you have 500k regions.
        Hide
        stack added a comment -

        Jian: Yes, Andrew is asking how many of the 24M objects are not collectable by the JVM? Does your heap analysis tool have a means of cleaning dead objects and only showing 'live objects'?

        Show
        stack added a comment - Jian: Yes, Andrew is asking how many of the 24M objects are not collectable by the JVM? Does your heap analysis tool have a means of cleaning dead objects and only showing 'live objects'?
        Hide
        jian zhang added a comment -

        The 24M objects are all live,not include dead objects.
        I have test senarios below with this new patch:
        1,start cluster normally,insert data and then dump hmaster memory;
        2,when cluster is running,kill active hmaster and standby hmaster switch to active hmaster,then dump the new active hmaster memory;
        3,kill or join new regionserver to running cluster,when balance finished,dump hmaster memory.

        All senarios above,the hmaster does not have unnessessary HServerLoad objects and the balance can work too.

        Show
        jian zhang added a comment - The 24M objects are all live,not include dead objects. I have test senarios below with this new patch: 1,start cluster normally,insert data and then dump hmaster memory; 2,when cluster is running,kill active hmaster and standby hmaster switch to active hmaster,then dump the new active hmaster memory; 3,kill or join new regionserver to running cluster,when balance finished,dump hmaster memory. All senarios above,the hmaster does not have unnessessary HServerLoad objects and the balance can work too.
        Hide
        jian zhang added a comment -

        please use this new attachement.

        Show
        jian zhang added a comment - please use this new attachement.
        Hide
        stack added a comment -

        Committed to branch (Doesn't make sense on TRUNK). Thanks for the patch Jian.

        Show
        stack added a comment - Committed to branch (Doesn't make sense on TRUNK). Thanks for the patch Jian.

          People

          • Assignee:
            Unassigned
            Reporter:
            jian zhang
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 168h
              168h
              Remaining:
              Remaining Estimate - 168h
              168h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development