Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-2027

HConnectionManager.HBASE_INSTANCES leaks TableServers

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.20.0
    • Fix Version/s: 0.20.3, 0.90.0
    • Component/s: None
    • Labels:
      None

      Description

      HConnectionManager.HBASE_INSTANCES is a WeakHashMap from HBaseConfiguration to TableServers. However, each TableServers has a strong reference back to the HBaseConfiguration key so they are never freed. (See note at http://java.sun.com/javase/6/docs/api/java/util/WeakHashMap.html : "Implementation note: The value objects in a WeakHashMap are held by ordinary strong references. Thus care should be taken to ensure that value objects do not strongly refer to their own keys, either directly or indirectly, since that will prevent the keys from being discarded.")

      Moreover, HBaseConfiguration implements hashCode() but not equals() so identical HBaseConfiguration objects each get their own TableServers object.

      We had a long running HBase client process that was creating new HTable() objects, each creating a new HBaseConfiguration() and thus a new TableServers object. It eventually went OOM, and gave a heap dump indicating 360 MB of data retained by HBASE_INSTANCES.

      1. 2027-LRU.patch
        3 kB
        Dave Latham

        Issue Links

          Activity

          Hide
          davelatham Dave Latham added a comment -

          Here's a patch for 0.20 that changes the WeakHashMap to use WeakReference's to the TableServers for values and adds an equals method to HBaseConfiguration so that identical configs will share the same TableServers instead of each getting their own. Also corrects the javadoc for HBaseConfiguration.hashCode().

          Show
          davelatham Dave Latham added a comment - Here's a patch for 0.20 that changes the WeakHashMap to use WeakReference's to the TableServers for values and adds an equals method to HBaseConfiguration so that identical configs will share the same TableServers instead of each getting their own. Also corrects the javadoc for HBaseConfiguration.hashCode().
          Hide
          davelatham Dave Latham added a comment -

          This patch should apply to both trunk and branch.

          Show
          davelatham Dave Latham added a comment - This patch should apply to both trunk and branch.
          Hide
          davelatham Dave Latham added a comment -

          I spent some more time thinking about this issue, and reading through the background HBASE-1251 which set it up.

          The cache of connections as it stands now is never reduced or freed due to the strong reference each TableServers holds back to the HBaseConfiguration key in the HBASE_INSTANCES map. However, with the first patch I provided, if client code keeps a strong reference to an HBaseConfiguration but not to the connection itself, then the connection information may be freed even if the HBaseConfiguration object is still around. This is not desirable either.

          Another possibility would be to convert the reference the TableServers holds to the HBaseConfiguration to a WeakReference instead of converting the HBASE_INSTANCES value to holding a WeakReference to the TableServers. However, this also presents problems because then if the client held a strong reference to the HConnection but not to the HBaseConfiguration (less likely than the earlier case, I believe) then the configuration reference could be freed and methods that require it would fail.

          I propose another simpler method, to get rid of the WeakHashMap / WeakReferences entirely and make the HBASE_INSTANCES map a simple LRU cache of the last 10 HBaseConfiguration to HConnection instances. This will bound the amount of memory used up by the cache (better than the current implementation) at the risk of hbase clients who need to repeatedly use more than 10 different hbase configurations losing cached connections (I believe this to be very rare, but please enlighten me if I am mistaken.)

          I'm attaching a new patch with this solution. Do you think it would be better to make the size of this cache a config option instead of hardcoding it?

          Show
          davelatham Dave Latham added a comment - I spent some more time thinking about this issue, and reading through the background HBASE-1251 which set it up. The cache of connections as it stands now is never reduced or freed due to the strong reference each TableServers holds back to the HBaseConfiguration key in the HBASE_INSTANCES map. However, with the first patch I provided, if client code keeps a strong reference to an HBaseConfiguration but not to the connection itself, then the connection information may be freed even if the HBaseConfiguration object is still around. This is not desirable either. Another possibility would be to convert the reference the TableServers holds to the HBaseConfiguration to a WeakReference instead of converting the HBASE_INSTANCES value to holding a WeakReference to the TableServers. However, this also presents problems because then if the client held a strong reference to the HConnection but not to the HBaseConfiguration (less likely than the earlier case, I believe) then the configuration reference could be freed and methods that require it would fail. I propose another simpler method, to get rid of the WeakHashMap / WeakReferences entirely and make the HBASE_INSTANCES map a simple LRU cache of the last 10 HBaseConfiguration to HConnection instances. This will bound the amount of memory used up by the cache (better than the current implementation) at the risk of hbase clients who need to repeatedly use more than 10 different hbase configurations losing cached connections (I believe this to be very rare, but please enlighten me if I am mistaken.) I'm attaching a new patch with this solution. Do you think it would be better to make the size of this cache a config option instead of hardcoding it?
          Hide
          stack stack added a comment -

          Your simpler bounded count of instances sounds good to me. Ten should be more than enough. You could make it configurable without adding the config. to hbase-default.xml; fellas would have to read the code to figure that they could change the value. Its a static context though when this stuff is being set. I'd say the patch is good enough as is. Unless you object, will commit.

          You know, set the max to 31. ZK maximum connections from a host is default 30. This way, users will hardly ever hit this hard-coded max. They'll get complaint from zk instead (And 30 instances is still small potatoes).

          Good on you Dave.

          Show
          stack stack added a comment - Your simpler bounded count of instances sounds good to me. Ten should be more than enough. You could make it configurable without adding the config. to hbase-default.xml; fellas would have to read the code to figure that they could change the value. Its a static context though when this stuff is being set. I'd say the patch is good enough as is. Unless you object, will commit. You know, set the max to 31. ZK maximum connections from a host is default 30. This way, users will hardly ever hit this hard-coded max. They'll get complaint from zk instead (And 30 instances is still small potatoes). Good on you Dave.
          Hide
          davelatham Dave Latham added a comment -

          I just noticed HBASE-1976 which is definitely related and the comments there. There might be a better key for HBASE_INSTANCES than using the entire HBaseConfiguration. Why don't you hold off on the commit until I (or someone else) gives that a bit more thought.

          Show
          davelatham Dave Latham added a comment - I just noticed HBASE-1976 which is definitely related and the comments there. There might be a better key for HBASE_INSTANCES than using the entire HBaseConfiguration. Why don't you hold off on the commit until I (or someone else) gives that a bit more thought.
          Hide
          stack stack added a comment -

          @Dave Grand. My uninformed notion is that the way this HBASE_INSTANCES stuff works is broke. A static map will always be problematic. I can understand that in the one JVM you might want differently configured connections but keying with the HBC instance seems off. Table name might be better but then what if you wanted differently configured connections to the same table? Would that even make sense? Thanks for looking into this.

          Show
          stack stack added a comment - @Dave Grand. My uninformed notion is that the way this HBASE_INSTANCES stuff works is broke. A static map will always be problematic. I can understand that in the one JVM you might want differently configured connections but keying with the HBC instance seems off. Table name might be better but then what if you wanted differently configured connections to the same table? Would that even make sense? Thanks for looking into this.
          Hide
          davelatham Dave Latham added a comment -

          Updated patch to set the max cached HConnection instances to 31.

          I gave it a bit more thought, and I'm not sure what the best way would look like to track and support different connection sets is, so I propose we go with this patch as a definite improvement over the current set up and leave further improvement to future work.

          Show
          davelatham Dave Latham added a comment - Updated patch to set the max cached HConnection instances to 31. I gave it a bit more thought, and I'm not sure what the best way would look like to track and support different connection sets is, so I propose we go with this patch as a definite improvement over the current set up and leave further improvement to future work.
          Hide
          stack stack added a comment -

          Agreed about new issue to fix this properly (Can be part of client rewrite when we take on another RPC, one that is non-blocking nio rather than handler-based hadoop RPC).

          Committed to branch and trunk.

          Show
          stack stack added a comment - Agreed about new issue to fix this properly (Can be part of client rewrite when we take on another RPC, one that is non-blocking nio rather than handler-based hadoop RPC). Committed to branch and trunk.
          Hide
          stack stack added a comment -

          Thanks for the patch Dave.

          Show
          stack stack added a comment - Thanks for the patch Dave.

            People

            • Assignee:
              davelatham Dave Latham
              Reporter:
              davelatham Dave Latham
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development