Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-3792

TableInputFormat leaks ZK connections

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Won't Fix
    • Affects Version/s: 0.90.1
    • Fix Version/s: None
    • Component/s: mapreduce
    • Labels:
      None
    • Environment:

      Java 1.6.0_24, Mac OS X 10.6.7

      Description

      The TableInputFormat creates an HTable using a new Configuration object, and it never cleans it up. When running a Mapper, the TableInputFormat is instantiated and the ZK connection is created. While this connection is not explicitly cleaned up, the Mapper process eventually exits and thus the connection is closed. Ideally the TableRecordReader would close the connection in its close() method rather than relying on the process to die for connection cleanup. This is fairly easy to implement by overriding TableRecordReader, and also overriding TableInputFormat to specify the new record reader.

      The leak occurs when the JobClient is initializing and needs to retrieves the splits. To get the splits, it instantiates a TableInputFormat. Doing so creates a ZK connection that is never cleaned up. Unlike the mapper, however, my job client process does not die. Thus the ZK connections accumulate.

      I was able to fix the problem by writing my own TableInputFormat that does not initialize the HTable in the getConf() method and does not have an HTable member variable. Rather, it has a variable for the table name. The HTable is instantiated where needed and then cleaned up. For example, in the getSplits() method, I create the HTable, then close the connection once the splits are retrieved. I also create the HTable when creating the record reader, and I have a record reader that closes the connection when done.

      1. patch0.90.4
        11 kB
        Bryan Keller
      2. tableinput.patch
        9 kB
        Bryan Keller

        Issue Links

          Activity

          Hide
          jdcryans Jean-Daniel Cryans added a comment -

          The main issue with not using a new connection is that it breaks CopyTable or any other job that wants to use one cluster then another. I think you're workaround is the way to go until we fix the real source of the problem (HBASE-3777 being one solution to it).

          Show
          jdcryans Jean-Daniel Cryans added a comment - The main issue with not using a new connection is that it breaks CopyTable or any other job that wants to use one cluster then another. I think you're workaround is the way to go until we fix the real source of the problem ( HBASE-3777 being one solution to it).
          Hide
          stack stack added a comment -

          So, HBASE-3777 was committed to trunk. Can we close this issue now as fixed in TRUNK?

          Show
          stack stack added a comment - So, HBASE-3777 was committed to trunk. Can we close this issue now as fixed in TRUNK?
          Hide
          bryanck Bryan Keller added a comment -

          I think the change should still be made. The HBASE-3777 change adds reference counters to connection, so closing the connection here explicitly where needed will decrement the reference counter and allow for connection cleanup.

          Show
          bryanck Bryan Keller added a comment - I think the change should still be made. The HBASE-3777 change adds reference counters to connection, so closing the connection here explicitly where needed will decrement the reference counter and allow for connection cleanup.
          Hide
          bryanck Bryan Keller added a comment -

          I was planning on uploading a patch later today in case it would be useful.

          Show
          bryanck Bryan Keller added a comment - I was planning on uploading a patch later today in case it would be useful.
          Hide
          bryanck Bryan Keller added a comment -

          Here's a patch demonstrating the changes I have implemented in my system, as described above. The patch is for trunk, so the changes are slightly different than what I am using for 0.90.4.

          Show
          bryanck Bryan Keller added a comment - Here's a patch demonstrating the changes I have implemented in my system, as described above. The patch is for trunk, so the changes are slightly different than what I am using for 0.90.4.
          Hide
          bryanck Bryan Keller added a comment -

          BTW, I am having trouble running the tests on trunk so I wasn't able to verify this patch, I'll work on getting my dev environment more functional.

          Show
          bryanck Bryan Keller added a comment - BTW, I am having trouble running the tests on trunk so I wasn't able to verify this patch, I'll work on getting my dev environment more functional.
          Hide
          terry.siu Terry Siu added a comment -

          Bryan, would be able to post a patch of the changes you are using for 0.90.4? I applied the trunk patch to 0.90.4 and aside from one minor flub, the patch was very clean. I left my mapreduce jobs to run overnight and am seeing ZK connections accummulating again, but at a slower rate, so now I'm wondering what differences exist between the changes you made for 0.90.4 versus the one you posted. Thanks!

          Show
          terry.siu Terry Siu added a comment - Bryan, would be able to post a patch of the changes you are using for 0.90.4? I applied the trunk patch to 0.90.4 and aside from one minor flub, the patch was very clean. I left my mapreduce jobs to run overnight and am seeing ZK connections accummulating again, but at a slower rate, so now I'm wondering what differences exist between the changes you made for 0.90.4 versus the one you posted. Thanks!
          Hide
          bryanck Bryan Keller added a comment -

          Sure, I'll post a patch for 0.90.4 in a bit. There have been quite a few changes to ZK connection handling in trunk (deep compare of configs, reference counting), so it is possible the patch might need to be tweaked or the leak is somewhere else.

          Show
          bryanck Bryan Keller added a comment - Sure, I'll post a patch for 0.90.4 in a bit. There have been quite a few changes to ZK connection handling in trunk (deep compare of configs, reference counting), so it is possible the patch might need to be tweaked or the leak is somewhere else.
          Hide
          terry.siu Terry Siu added a comment -

          Thanks, Bryan, looking forward to getting the 0.90.4 patch.

          Show
          terry.siu Terry Siu added a comment - Thanks, Bryan, looking forward to getting the 0.90.4 patch.
          Hide
          bryanck Bryan Keller added a comment -

          Here is the patch I'm using in my production system for 0.90.4. (Sorry it took me so long.)

          Show
          bryanck Bryan Keller added a comment - Here is the patch I'm using in my production system for 0.90.4. (Sorry it took me so long.)
          Hide
          terry.siu Terry Siu added a comment -

          No worries, Bryan. I managed to patch it myself a while back using the tableinput.patch as a guideline. Thanks again!

          Show
          terry.siu Terry Siu added a comment - No worries, Bryan. I managed to patch it myself a while back using the tableinput.patch as a guideline. Thanks again!
          Hide
          bryanck Bryan Keller added a comment -

          The latest Cloudera code introduced the new reference counting connection management. There is a reference counter leak it appears in the HTable constructor, thus you'll see connection leaks again and my patch doesn't fix it. As a hack for now I force the connection to close by using reflection, setting the ref counter to 1, and calling close() on the connection. I do this after calling table.close() in TableInputFormat, TableRecordReader, and TableOutputFormat. I think I should log another bug, as the leak is not in the map reduce classes.

          Show
          bryanck Bryan Keller added a comment - The latest Cloudera code introduced the new reference counting connection management. There is a reference counter leak it appears in the HTable constructor, thus you'll see connection leaks again and my patch doesn't fix it. As a hack for now I force the connection to close by using reflection, setting the ref counter to 1, and calling close() on the connection. I do this after calling table.close() in TableInputFormat, TableRecordReader, and TableOutputFormat. I think I should log another bug, as the leak is not in the map reduce classes.
          Hide
          stack stack added a comment -

          @Bryan Yes please. Thats crazy acrobatics you are at just to close a connection. Thanks.

          Show
          stack stack added a comment - @Bryan Yes please. Thats crazy acrobatics you are at just to close a connection. Thanks.
          Hide
          patrickyu Patrick Yu added a comment -

          One possible workaround for this issue is to reuse the connection for all jobs. However, hbase.connection.per.config must be set to false for connection sharing to work. This flag defaults to true up until HBASE-4508, though.

          Show
          patrickyu Patrick Yu added a comment - One possible workaround for this issue is to reuse the connection for all jobs. However, hbase.connection.per.config must be set to false for connection sharing to work. This flag defaults to true up until HBASE-4508 , though.
          Hide
          lhofhansl Lars Hofhansl added a comment -

          @Patrik: In 0.92+ you can create a standalone HConnection (HConnectionManager.createConnection(...)) and then create HTables using that HConnection. See HBASE-4805.

          Show
          lhofhansl Lars Hofhansl added a comment - @Patrik: In 0.92+ you can create a standalone HConnection (HConnectionManager.createConnection(...)) and then create HTables using that HConnection. See HBASE-4805 .
          Hide
          lhofhansl Lars Hofhansl added a comment -

          What should we do with this?

          Show
          lhofhansl Lars Hofhansl added a comment - What should we do with this?
          Hide
          esteban Esteban Gutierrez added a comment -

          The ZK connection leaks have been addressed multiple times over the last few years, see HBASE-14485, HBASE-15803, HBASE-16117. Also, most of the connection manager code has been refactored.

          Show
          esteban Esteban Gutierrez added a comment - The ZK connection leaks have been addressed multiple times over the last few years, see HBASE-14485 , HBASE-15803 , HBASE-16117 . Also, most of the connection manager code has been refactored.

            People

            • Assignee:
              Unassigned
              Reporter:
              bryanck Bryan Keller
            • Votes:
              6 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development