Details

    • Type: Sub-task Sub-task
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: 0.94.4
    • Fix Version/s: None
    • Component/s: Thrift
    • Labels:

      Description

      ThriftHBaseServiceHandler.openScanner() does this:
      1. table = pool.getTable()
      2. scanner = table.getScanner()
      3. table.close()
      4. return scanner

      While back porting the thrift server to 0.92.6, I found that table.close() calls connection.close(). Further calls to scanner.next() raise a ConnectionClosed exception. The unit tests do not catch this since they reuse an open HConnection instance.

      This might work on trunk, but depends on the implementations of HTablePool, HTable and HConnectionManager. Even with the pool wrapper, if the pool is full, table.close() may be called, which may invalidate the table. Also, HTable is not thread-safe, but they are being reused since they go back in the pool.

      I suggest storing the table handle along with the scanner.

        Issue Links

          Activity

          Hide
          Lars George added a comment -

          I tried to figure what the status quo is here and did create a test app (https://github.com/larsgeorge/hbase-scanner-test), which simulates the discussed issue. I am testing this on 0.94 to see what happens from there on out. So I found that it is pretty much impossible to close the connection, since there are many classes holding a reference to it. It is also unlikely that you have a separate config instance and therefore cause a second connection to be tracked.

          Also, as suspected, scans do not share anything with the table instance they were created from, besides the connection. So while it is possible to close the table and reduce the reference counter, it is unlikely for the above reasons (and testing) that this causes issues. Since they only share the connection, ResultScanners are thread-safe (go figure).

          I am closing this issue for now as not relevant, please reopen if someone feels I missed something.

          Show
          Lars George added a comment - I tried to figure what the status quo is here and did create a test app ( https://github.com/larsgeorge/hbase-scanner-test ), which simulates the discussed issue. I am testing this on 0.94 to see what happens from there on out. So I found that it is pretty much impossible to close the connection, since there are many classes holding a reference to it. It is also unlikely that you have a separate config instance and therefore cause a second connection to be tracked. Also, as suspected, scans do not share anything with the table instance they were created from, besides the connection. So while it is possible to close the table and reduce the reference counter, it is unlikely for the above reasons (and testing) that this causes issues. Since they only share the connection, ResultScanners are thread-safe (go figure). I am closing this issue for now as not relevant, please reopen if someone feels I missed something.
          Hide
          Lars George added a comment -

          But this does point me to the actual pool size settings:

            ThriftHBaseServiceHandler(Configuration conf) {
              htablePool = new HTablePool(conf, Integer.MAX_VALUE);
            }
          

          which means it is pretty much unbound. That seems wrong too. I create HBASE-8948 to track that separately.

          Show
          Lars George added a comment - But this does point me to the actual pool size settings: ThriftHBaseServiceHandler(Configuration conf) { htablePool = new HTablePool(conf, Integer .MAX_VALUE); } which means it is pretty much unbound. That seems wrong too. I create HBASE-8948 to track that separately.
          Hide
          Lars George added a comment - - edited

          If the pool is full, table.close() may be called by HTablePool. This calls connection.close(), but ResultScanner calls table.getConnection().

          No Devin, that was my point earlier, the connection.close() call use a reference counter. It does not close the connection while there are other "users". And in case of Thrift, it has a pool still open and therefore the connection will not be closed under the hood.

          You do not have to set the pool size any different, or even larger at all.

          As long as the HTablePool is not closed, all is fine.

          Show
          Lars George added a comment - - edited If the pool is full, table.close() may be called by HTablePool. This calls connection.close(), but ResultScanner calls table.getConnection(). No Devin, that was my point earlier, the connection.close() call use a reference counter. It does not close the connection while there are other "users". And in case of Thrift, it has a pool still open and therefore the connection will not be closed under the hood. You do not have to set the pool size any different, or even larger at all. As long as the HTablePool is not closed, all is fine.
          Hide
          Devin Bayer added a comment -

          Hi Lars - what about this:

          If the pool is full, table.close() may be called by HTablePool. This calls connection.close(), but ResultScanner calls table.getConnection().

          OK, maybe it will happen to work due to various coincidences, but it's not obvious it will work 100% of the time. For sure we must not change implementations of HTable, HConnection or HTablePool. We must also set the pool size greater than the thrift thread size.

          ClientScanner is an inner class and does retain a reference to the HTable instance.

          I think it's just asking for trouble.

          Show
          Devin Bayer added a comment - Hi Lars - what about this: If the pool is full, table.close() may be called by HTablePool. This calls connection.close(), but ResultScanner calls table.getConnection(). OK, maybe it will happen to work due to various coincidences, but it's not obvious it will work 100% of the time. For sure we must not change implementations of HTable, HConnection or HTablePool. We must also set the pool size greater than the thrift thread size. ClientScanner is an inner class and does retain a reference to the HTable instance. I think it's just asking for trouble.
          Hide
          Lars George added a comment - - edited

          Looking at HBASE-4205, where this wording was added, it does not explain the reasons. For writing, this is clear, but not for reading. The HTable from back then (0.90.4) has changed a bit, and is now been given the connection reference directly, as opposed to be calling the method of the wrapping HTable instance. I'll raise this question up on dev@ to confirm the status quo.

          But, assuming the reading path is thread safe, we should be fine overall, no? Then using the pool is doing the job needed, keeping the connection open. In other words, for 0.94+ we do not need your patch? Not wanting to ruin the show here, please forgive me.

          Show
          Lars George added a comment - - edited Looking at HBASE-4205 , where this wording was added, it does not explain the reasons. For writing, this is clear, but not for reading. The HTable from back then (0.90.4) has changed a bit, and is now been given the connection reference directly, as opposed to be calling the method of the wrapping HTable instance. I'll raise this question up on dev@ to confirm the status quo. But, assuming the reading path is thread safe, we should be fine overall, no? Then using the pool is doing the job needed, keeping the connection open. In other words, for 0.94+ we do not need your patch? Not wanting to ruin the show here, please forgive me.
          Hide
          Lars George added a comment -

          Hi Devin, I would not be surprised if this is outdated. I know that for writing the local client side write buffer is an issue since it is not thread safe. But for reading, this is news to me. Looking at the code, the ClientScanner has no reference to the HTable, and it does RPC requests to the RegionServers directly through the shared connection. I think we should raise a new JIRA to fix the JavaDoc for the HTable class and make sure it is current.

          Show
          Lars George added a comment - Hi Devin, I would not be surprised if this is outdated. I know that for writing the local client side write buffer is an issue since it is not thread safe. But for reading, this is news to me. Looking at the code, the ClientScanner has no reference to the HTable, and it does RPC requests to the RegionServers directly through the shared connection. I think we should raise a new JIRA to fix the JavaDoc for the HTable class and make sure it is current.
          Hide
          Devin Bayer added a comment -

          If you read HTable is says:

          This class is not thread safe for reads nor write.
          ...
          In case of reads, some fields used by a Scan are shared among all threads. The HTable implementation can either not contract to be safe in case of a Get

          That does not inspire confidence that scanners are or will remain threadsafe - in fact it suggests the opposite.

          Show
          Devin Bayer added a comment - If you read HTable is says: This class is not thread safe for reads nor write. ... In case of reads, some fields used by a Scan are shared among all threads. The HTable implementation can either not contract to be safe in case of a Get That does not inspire confidence that scanners are or will remain threadsafe - in fact it suggests the opposite.
          Hide
          Lars George added a comment -

          Hey Devin Bayer,

          If we just put HTable back into the pool, I think the connection count will never be reduced, so it will never reach zero and we'll be OK.

          Yes, that was my point, since the connection is always used by some tables, we should be fine.

          However, the HTable interface doesn't guarantee thread safety between scanners, so we can't share them between threads anyway.

          That was my earlier point, this does not matter with scanners. You can return the table to the pool and another thread can use it fine - given only one thread at a time is using it, and that is guaranteed by the default PoolMap in HTablePool. The scanner (i.e. ClientScanner instance) is independent and only shares the connection, which is shared by all tables anyways. Could you elaborate on where you think this is flawed?

          Show
          Lars George added a comment - Hey Devin Bayer , If we just put HTable back into the pool, I think the connection count will never be reduced, so it will never reach zero and we'll be OK. Yes, that was my point, since the connection is always used by some tables, we should be fine. However, the HTable interface doesn't guarantee thread safety between scanners, so we can't share them between threads anyway. That was my earlier point, this does not matter with scanners. You can return the table to the pool and another thread can use it fine - given only one thread at a time is using it, and that is guaranteed by the default PoolMap in HTablePool. The scanner (i.e. ClientScanner instance) is independent and only shares the connection, which is shared by all tables anyways. Could you elaborate on where you think this is flawed?
          Hide
          Devin Bayer added a comment -

          Hi Lars:

          If we just put HTable back into the pool, I think the connection count will never be reduced, so it will never reach zero and we'll be OK. If that isn't guaranteed by HTablePool, we must wait until the thrift connection is closed.

          However, the HTable interface doesn't guarantee thread safety between scanners, so we can't share them between threads anyway.

          We might as well just hold onto the HTable until the thrift connection is closed or we're really getting too deep into HBase internals.

          Show
          Devin Bayer added a comment - Hi Lars: If we just put HTable back into the pool, I think the connection count will never be reduced, so it will never reach zero and we'll be OK. If that isn't guaranteed by HTablePool, we must wait until the thrift connection is closed. However, the HTable interface doesn't guarantee thread safety between scanners, so we can't share them between threads anyway. We might as well just hold onto the HTable until the thrift connection is closed or we're really getting too deep into HBase internals.
          Hide
          Lars George added a comment -

          Connections are reference counted, see HConnectionManager for reference. So given the issues we had with connection identities in there are fixed, we should be OK to close a table as the connection will be counted down? But then the scanner is a non-counted reference, but only when all tables are closed and the reference count ends up being zero it gets really closed. By that time scanners should be all done?

          The other option we discussed is to hold on to the table for the scanner, but then we have one for every instance, which is way too much under load. We only need to hold on to one per table, therefore keeping the umbilical cord alive (so to speak).

          Show
          Lars George added a comment - Connections are reference counted, see HConnectionManager for reference. So given the issues we had with connection identities in there are fixed, we should be OK to close a table as the connection will be counted down? But then the scanner is a non-counted reference, but only when all tables are closed and the reference count ends up being zero it gets really closed. By that time scanners should be all done? The other option we discussed is to hold on to the table for the scanner, but then we have one for every instance, which is way too much under load. We only need to hold on to one per table, therefore keeping the umbilical cord alive (so to speak).
          Hide
          Lars George added a comment -

          Hamed Madani, ah sorry, you meant the PooledHTable wrapper handed out by the pool. Makes sense now. Devin Bayer is right though, this has changed over time, but what we are concerned about today is all sharing the same wrapper, i.e. HTablePool in 0.94's branch head.

          So we are facing the above question then really, should we hold on to the table instance or not. I am looking it that now, but please let me know your thoughts.

          Show
          Lars George added a comment - Hamed Madani , ah sorry, you meant the PooledHTable wrapper handed out by the pool. Makes sense now. Devin Bayer is right though, this has changed over time, but what we are concerned about today is all sharing the same wrapper, i.e. HTablePool in 0.94's branch head. So we are facing the above question then really, should we hold on to the table instance or not. I am looking it that now, but please let me know your thoughts.
          Hide
          Lars George added a comment -

          Hamed Madani, I am not sure what you are looking at. HTable has no reference to any HTablePool instance, so calling HTable.close does not return it to the pool. There is an internal pool of worker threads handled in HTable.close() but that is unrelated.

          Devin Bayer, I see what you are saying - and of course we need to hold on to the table instances - either using HTablePool or as Thrift 1 does with our own list (which you do of sorts).

          Your patch does change the behaviour so that when a scanner is opened, the HTable instance is not returned to the pool, but only when you close the scanner.

          The first issue you are describing is that when you return the table, you might run into the issue that the table is reused although they are not thread safe. That should not be an issue since you are not using the table anymore, but a scanner instance. I think the second issue you describe is the problem, i.e. when a table is closed, the underlying connection is closed possibly, and therefore leaves the scanner dangling.

          I need to look into 0.94, 0.95 and trunk/0.98 to see what the status (as you and Hamed touch upon above). If we decide to hang on, we could wrap this into HBASE-3852 which creates a Scanner wrapper, holding a last-used time and the actual result scanner. We could easily add the table instance there and hold on to it in one map. Same thing as yours, just a merge.

          Show
          Lars George added a comment - Hamed Madani , I am not sure what you are looking at. HTable has no reference to any HTablePool instance, so calling HTable.close does not return it to the pool. There is an internal pool of worker threads handled in HTable.close() but that is unrelated. Devin Bayer , I see what you are saying - and of course we need to hold on to the table instances - either using HTablePool or as Thrift 1 does with our own list (which you do of sorts). Your patch does change the behaviour so that when a scanner is opened, the HTable instance is not returned to the pool, but only when you close the scanner. The first issue you are describing is that when you return the table, you might run into the issue that the table is reused although they are not thread safe. That should not be an issue since you are not using the table anymore, but a scanner instance. I think the second issue you describe is the problem, i.e. when a table is closed, the underlying connection is closed possibly, and therefore leaves the scanner dangling. I need to look into 0.94, 0.95 and trunk/0.98 to see what the status (as you and Hamed touch upon above). If we decide to hang on, we could wrap this into HBASE-3852 which creates a Scanner wrapper, holding a last-used time and the actual result scanner. We could easily add the table instance there and hold on to it in one map. Same thing as yours, just a merge.
          Hide
          Devin Bayer added a comment -

          Lars: That's a mistake - we should use HTablePool to avoid recreating the HTable instances.

          Hamed: It depends on the HBase version. As this was about backporting to 0.94.4, putTable() was needed.

          Show
          Devin Bayer added a comment - Lars: That's a mistake - we should use HTablePool to avoid recreating the HTable instances. Hamed: It depends on the HBase version. As this was about backporting to 0.94.4, putTable() was needed.
          Hide
          Hamed Madani added a comment -

          looking at htablePool.java, I don't think it's necessary to change closeTable() function. Both table.close() and htablePool.putTable(table) are calling returnTable(). from what I see putTable() is deprecated.

          Show
          Hamed Madani added a comment - looking at htablePool.java , I don't think it's necessary to change closeTable() function. Both table.close() and htablePool.putTable(table) are calling returnTable(). from what I see putTable() is deprecated.
          Hide
          Lars George added a comment -

          Hey Devin Bayer, so this is more like Thrift 1, which stores the tables in a ThreadLocal instance to keep it open, not using the HTablePool at all? In other words, you remove the pool here as well, keeping the tables for each scanner separately? Sounds reasonable to me.

          Show
          Lars George added a comment - Hey Devin Bayer , so this is more like Thrift 1, which stores the tables in a ThreadLocal instance to keep it open, not using the HTablePool at all? In other words, you remove the pool here as well, keeping the tables for each scanner separately? Sounds reasonable to me.
          Hide
          Devin Bayer added a comment -

          stack: patch is attached. It does work for us.

          Show
          Devin Bayer added a comment - stack: patch is attached. It does work for us.
          Hide
          stack added a comment -

          You have a patch for us Devin? You using thrift2? Its working for you?

          Show
          stack added a comment - You have a patch for us Devin? You using thrift2? Its working for you?
          Hide
          Devin Bayer added a comment -

          The open tables should also not be leaked.

          Show
          Devin Bayer added a comment - The open tables should also not be leaked.

            People

            • Assignee:
              Lars George
              Reporter:
              Devin Bayer
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development