HBase
  1. HBase
  2. HBASE-5399

Cut the link between the client and the zookeeper ensemble

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.94.0
    • Fix Version/s: 0.95.0
    • Component/s: Client
    • Labels:
      None
    • Environment:

      all

    • Hadoop Flags:
      Reviewed

      Description

      The link is often considered as an issue, for various reasons. One of them being that there is a limit on the number of connection that ZK can manage. Stack was suggesting as well to remove the link to master from HConnection.

      There are choices to be made considering the existing API (that we don't want to break).

      The first patches I will submit on hadoop-qa should not be committed: they are here to show the progress on the direction taken.

      ZooKeeper is used for:

      • public getter, to let the client do whatever he wants, and close ZooKeeper when closing the connection => we have to deprecate this but keep it.
      • read get master address to create a master => now done with a temporary zookeeper connection
      • read root location => now done with a temporary zookeeper connection, but questionable. Used in public function "locateRegion". To be reworked.
      • read cluster id => now done once with a temporary zookeeper connection.
      • check if base done is available => now done once with a zookeeper connection given as a parameter
      • isTableDisabled/isTableAvailable => public functions, now done with a temporary zookeeper connection.
      • Called internally from HBaseAdmin and HTable
      • getCurrentNrHRS(): public function to get the number of region servers and create a pool of thread => now done with a temporary zookeeper connection
        -

      Master is used for:

      • getMaster public getter, as for ZooKeeper => we have to deprecate this but keep it.
      • isMasterRunning(): public function, used internally by HMerge & HBaseAdmin
      • getHTableDescriptor*: public functions offering access to the master. => we could make them using a temporary master connection as well.

      Main points are:

      • hbase class for ZooKeeper; ZooKeeperWatcher is really designed for a strongly coupled architecture . This can be changed, but requires a lot of modifications in these classes (likely adding a class in the middle of the hierarchy, something like that). Anyway, non connected client will always be really slower, because it's a tcp connection, and establishing a tcp connection is slow.
      • having a link between ZK and all the client seems to make sense for some Use Cases. However, it won't scale if a TCP connection is required for every client
      • if we move the table descriptor part away from the client, we need to find a new place for it.
      • we will have the same issue if HBaseAdmin (for both ZK & Master), may be we can put a timeout on the connection. That would make the whole system less deterministic however.
      1. 5399_inprogress.patch
        34 kB
        Nicolas Liochon
      2. 5399_inprogress.v3.patch
        82 kB
        Nicolas Liochon
      3. 5399_inprogress.v9.patch
        90 kB
        Nicolas Liochon
      4. ASF.LICENSE.NOT.GRANTED--5399_inprogress.v14.patch
        92 kB
        Nicolas Liochon
      5. 5399_inprogress.v16.patch
        93 kB
        Nicolas Liochon
      6. 5399_inprogress.v18.patch
        98 kB
        Nicolas Liochon
      7. 5399_inprogress.v20.patch
        95 kB
        Nicolas Liochon
      8. 5399_inprogress.v21.patch
        97 kB
        Nicolas Liochon
      9. 5399_inprogress.v23.patch
        98 kB
        Nicolas Liochon
      10. 5399.v27.patch
        100 kB
        Nicolas Liochon
      11. 5399_inprogress.v32.patch
        116 kB
        Nicolas Liochon
      12. 5399.v38.patch
        112 kB
        Nicolas Liochon
      13. 5399.v39.patch
        116 kB
        Nicolas Liochon
      14. 5399.v40.patch
        116 kB
        Nicolas Liochon
      15. 5399.v41.patch
        117 kB
        Nicolas Liochon
      16. 5399.v42.patch
        116 kB
        Nicolas Liochon
      17. 5399.v42.patch
        116 kB
        Nicolas Liochon
      18. 5399.v42.patch
        116 kB
        Nicolas Liochon
      19. 5399.v42.patch
        116 kB
        Nicolas Liochon
      20. nochange.patch
        0.3 kB
        Nicolas Liochon

        Issue Links

          Activity

          Hide
          stack added a comment -

          Hurray on below:

          +  @Deprecated
             public ZooKeeperWatcher getZooKeeperWatcher() throws IOException;
          
          ...
          
          +   * @deprecated Removed because it was a mistake exposing zookeeper in this
          +   * interface (ZooKeeper is an implementation detail).
              */
          +  @Deprecated
             public HMasterInterface getMaster()
          

          Why are these deprecated? (Add why to the deprecated note – add pointer to where user can get functionality elsewise):

          +  @Deprecated
             public HRegionInterface getHRegionConnection(HServerAddress regionServer)
          

          Fix your comments in HCM. Missing 'e' on 'not' and 'd' on 'use'

          Do we get the clusterid on connection setup? Do we have to? Can we just get that when someone asks for it?

          Fix...

            // We will ope/close a ZooKeeper 
          

          What about isTableEnabled, etc., should they be deprecated, moved out of HConnnection? Or that is for a different issue?

          Should we be using straight ZooKeeper instead of ZooKeeperWatcher? We don't need watch facility?

          So far so good...

          Show
          stack added a comment - Hurray on below: + @Deprecated public ZooKeeperWatcher getZooKeeperWatcher() throws IOException; ... + * @deprecated Removed because it was a mistake exposing zookeeper in this + * interface (ZooKeeper is an implementation detail). */ + @Deprecated public HMasterInterface getMaster() Why are these deprecated? (Add why to the deprecated note – add pointer to where user can get functionality elsewise): + @Deprecated public HRegionInterface getHRegionConnection(HServerAddress regionServer) Fix your comments in HCM. Missing 'e' on 'not' and 'd' on 'use' Do we get the clusterid on connection setup? Do we have to? Can we just get that when someone asks for it? Fix... // We will ope/close a ZooKeeper What about isTableEnabled, etc., should they be deprecated, moved out of HConnnection? Or that is for a different issue? Should we be using straight ZooKeeper instead of ZooKeeperWatcher? We don't need watch facility? So far so good...
          Hide
          Nicolas Liochon added a comment -

          Why are these deprecated: getHRegionConnection(HServerAddress regionServer)

          It's because HServerAddress is deprecated. I will check that all the redirection are there.

          Do we get the clusterid on connection setup? Do we have to? Can we just get that when someone asks for it?

          Today yes, but we also manage the case when there is no id.

          Should we be using straight ZooKeeper instead of ZooKeeperWatcher? We don't need watch facility?

          We can do that, it requires some work to do it well (ZooKeeperWatcher has spread too much, and has a lot of responsibilities (for example, it's the owner of the znode names, created from the config parameters)). It would be much cleaner, and a little bit faster. We would still pay for the tcp connection however.

          isTableEnabled, etc., should they be deprecated, moved out of HConnnection

          I was thinking putting them in HBaseAdmin, does it makes sense?

          Show
          Nicolas Liochon added a comment - Why are these deprecated: getHRegionConnection(HServerAddress regionServer) It's because HServerAddress is deprecated. I will check that all the redirection are there. Do we get the clusterid on connection setup? Do we have to? Can we just get that when someone asks for it? Today yes, but we also manage the case when there is no id. Should we be using straight ZooKeeper instead of ZooKeeperWatcher? We don't need watch facility? We can do that, it requires some work to do it well (ZooKeeperWatcher has spread too much, and has a lot of responsibilities (for example, it's the owner of the znode names, created from the config parameters)). It would be much cleaner, and a little bit faster. We would still pay for the tcp connection however. isTableEnabled, etc., should they be deprecated, moved out of HConnnection I was thinking putting them in HBaseAdmin, does it makes sense?
          Hide
          stack added a comment -

          We can do that, it requires some work to do it well (ZooKeeperWatcher has spread too much, and has a lot of responsibilities (for example, it's the owner of the znode names, created from the config parameters)). It would be much cleaner, and a little bit faster. We would still pay for the tcp connection however.

          Ok. For another patch then I'd say (Agree ZKW is like a dumping ground for zk ops)

          isTableEnabled, etc., should they be deprecated, moved out of HConnnection .... I was thinking putting them in HBaseAdmin, does it makes sense?

          I think it makes sense... deprecate them in HCM and move to HBaseAdmin....

          Sweet.

          Show
          stack added a comment - We can do that, it requires some work to do it well (ZooKeeperWatcher has spread too much, and has a lot of responsibilities (for example, it's the owner of the znode names, created from the config parameters)). It would be much cleaner, and a little bit faster. We would still pay for the tcp connection however. Ok. For another patch then I'd say (Agree ZKW is like a dumping ground for zk ops) isTableEnabled, etc., should they be deprecated, moved out of HConnnection .... I was thinking putting them in HBaseAdmin, does it makes sense? I think it makes sense... deprecate them in HCM and move to HBaseAdmin.... Sweet.
          Hide
          Nicolas Liochon added a comment -

          After some thinking and different tries; I believe it makes sense to have a shared master connection and a shared zookeeper connection with a reference counter. The creation/release/close function would be in HConnection. We could add the refCounter in ZooKeeperWatcher, but it's more difficult to add in the proxy (I could do it however).

          Ideally, there would be a keep alive, i.e. we would not close the physical connection after when the counter becomes zero, but wait a minute or so. I'am just not sure on how to distinguish cleanly a simple release to a jvm shutdown.

          Patch is underway, I think the complete version for review will be ready tomorrow.

          However, I have not found a new place for the table descriptor part, today it's still in the connection.

          Show
          Nicolas Liochon added a comment - After some thinking and different tries; I believe it makes sense to have a shared master connection and a shared zookeeper connection with a reference counter. The creation/release/close function would be in HConnection. We could add the refCounter in ZooKeeperWatcher, but it's more difficult to add in the proxy (I could do it however). Ideally, there would be a keep alive, i.e. we would not close the physical connection after when the counter becomes zero, but wait a minute or so. I'am just not sure on how to distinguish cleanly a simple release to a jvm shutdown. Patch is underway, I think the complete version for review will be ready tomorrow. However, I have not found a new place for the table descriptor part, today it's still in the connection.
          Hide
          Nicolas Liochon added a comment -

          It's not the last version (it needs more comments, unit tests and likely bug fixes), but there is already a lot. Master & ZooKeeper connection are now created only when necessary, and are closed if not used for 5 minutes.
          I added the keep alive stuff. It's not a nice to have; without it the unit tests take twice more time.
          There is an issue with the masterCheck part, the previous behavior was strange. I need to review it in details.
          The patch is on monday trunk. I will make it compatible on current trunk this week-end.
          I will move isTableEnabled & so on in an other patch, this one is already too big...

          Show
          Nicolas Liochon added a comment - It's not the last version (it needs more comments, unit tests and likely bug fixes), but there is already a lot. Master & ZooKeeper connection are now created only when necessary, and are closed if not used for 5 minutes. I added the keep alive stuff. It's not a nice to have; without it the unit tests take twice more time. There is an issue with the masterCheck part, the previous behavior was strange. I need to review it in details. The patch is on monday trunk. I will make it compatible on current trunk this week-end. I will move isTableEnabled & so on in an other patch, this one is already too big...
          Hide
          stack added a comment -

          I don't follow:

          +    // todo stack nkeywal
          +    // We used to check in a loop that the master was running.
          +    // 1) There were imbricated "tries" loop. One here and one in getMaster
          +    // 2) As the master can disappear, it may not be necessary to check it.
          +    // We don't need the connection immediately, but we used to check the
          +    //  connection at the beginning in the past, and it's more user friendly
          +    //  to have the error immediately.
          +    // connection.isMasterRunning();
          

          I'd say lets not check master is there till we need it. Seems like a PITA going ahead and checking master on construction. This changes the behavior but I think its one thats ok to change.

          You are doing your own Callables. You've seen the Callables that go on in HTable. Any reason you avoid them? I suppose this is different in that you want to let go of the shared master. Looks fine.

          Are we getting retrieveClusterId on startup? Can we not do that? Can we do that when someone asks for it? Or is it happening after we've set up the zk connection anyways?

          Patch makes sense so far. Good stuff N.

          Show
          stack added a comment - I don't follow: + // todo stack nkeywal + // We used to check in a loop that the master was running. + // 1) There were imbricated "tries" loop. One here and one in getMaster + // 2) As the master can disappear, it may not be necessary to check it. + // We don't need the connection immediately, but we used to check the + // connection at the beginning in the past, and it's more user friendly + // to have the error immediately. + // connection.isMasterRunning(); I'd say lets not check master is there till we need it. Seems like a PITA going ahead and checking master on construction. This changes the behavior but I think its one thats ok to change. You are doing your own Callables. You've seen the Callables that go on in HTable. Any reason you avoid them? I suppose this is different in that you want to let go of the shared master. Looks fine. Are we getting retrieveClusterId on startup? Can we not do that? Can we do that when someone asks for it? Or is it happening after we've set up the zk connection anyways? Patch makes sense so far. Good stuff N.
          Hide
          Nicolas Liochon added a comment -

          Are we getting retrieveClusterId on startup? Can we not do that? Can we do that when someone asks for it? Or is it happening after we've set up the zk connection anyways?

          We're doing it when we create the connection, so we're connection for ZK just for this today. However, I think that we should also connect to ZK to get the master address (this would allow to start a client without specifying the master address: today it must be in ZK but also in client configuration). Wet would still connect to ZK at starting time, so, but for two reasons instead of only one.

          On another hand, There's a comment from Gary on the bug review tracker (https://review.cloudera.org//r/1669/) for HBASE-3677:
          Yes, no real need for a tracker. Andy and I had discussed that previously. ZK is just used to broadcast the ID to clients and servers without the need for HBase RPC (which for token authentication requires selection of the right token, but we don't know the token without the ID...). I'll post an update that just reads the ID from ZK without using a tracker.

          So It seems that we could safely:

          • remove the clusterIdWatcher class
          • replace the conf parameter "hbase.cluster.id" by a simple getter on Connection. That would allow to get the id only when asked, with a lazy initialization, hence connecting later to ZK.

          I don't mind doing this, but it's better to do it in another jira (this one's too big already).

          Show
          Nicolas Liochon added a comment - Are we getting retrieveClusterId on startup? Can we not do that? Can we do that when someone asks for it? Or is it happening after we've set up the zk connection anyways? We're doing it when we create the connection, so we're connection for ZK just for this today. However, I think that we should also connect to ZK to get the master address (this would allow to start a client without specifying the master address: today it must be in ZK but also in client configuration). Wet would still connect to ZK at starting time, so, but for two reasons instead of only one. On another hand, There's a comment from Gary on the bug review tracker ( https://review.cloudera.org//r/1669/ ) for HBASE-3677 : Yes, no real need for a tracker. Andy and I had discussed that previously. ZK is just used to broadcast the ID to clients and servers without the need for HBase RPC (which for token authentication requires selection of the right token, but we don't know the token without the ID...). I'll post an update that just reads the ID from ZK without using a tracker. So It seems that we could safely: remove the clusterIdWatcher class replace the conf parameter "hbase.cluster.id" by a simple getter on Connection. That would allow to get the id only when asked, with a lazy initialization, hence connecting later to ZK. I don't mind doing this, but it's better to do it in another jira (this one's too big already).
          Hide
          Nicolas Liochon added a comment -

          In v9 I added some comments and fixed some issues.

          I'd say lets not check master is there till we need it. Seems like a PITA going ahead and checking master on construction. This changes the behavior but I think its one thats ok to change.

          Agreed, removed in v9.

          You are doing your own Callables. You've seen the Callables that go on in HTable. Any reason you avoid them? I suppose this is different in that you want to let go of the shared master. Looks fine.

          I could also add the master management in the HTable callables. I would still need the one I wrote for HBaseAdmin, but it could be useful to be able to use master from the HTable callables?

          I still need to work on the unit tests, some large tests are failing today:

          Failed tests:   testBackgroundEvictionThread[1](org.apache.hadoop.hbase.io.hfile
          .TestLruBlockCache)
            testShutdownFixupWhenDaughterHasSplit(org.apache.hadoop.hbase.regionserver.Tes
          tSplitTransactionOnCluster): expected:<1> but was:<0>
            queueFailover(org.apache.hadoop.hbase.replication.TestReplication): Waited too
           much time for queueFailover replication
            testMergeTool(org.apache.hadoop.hbase.util.TestMergeTool): 'merging regions 0
          and 1' failed with errCode -1
          
          Tests in error:
            test3686a(org.apache.hadoop.hbase.client.TestScannerTimeout): 64142ms passed s
          ince the last invocation, timeout is currently set to 10000
          

          For some of them it could be random and unrelated, but I am sure there are some real issues for of them.
          I am on vacations next week; so I will come back to it the week after...

          Show
          Nicolas Liochon added a comment - In v9 I added some comments and fixed some issues. I'd say lets not check master is there till we need it. Seems like a PITA going ahead and checking master on construction. This changes the behavior but I think its one thats ok to change. Agreed, removed in v9. You are doing your own Callables. You've seen the Callables that go on in HTable. Any reason you avoid them? I suppose this is different in that you want to let go of the shared master. Looks fine. I could also add the master management in the HTable callables. I would still need the one I wrote for HBaseAdmin, but it could be useful to be able to use master from the HTable callables? I still need to work on the unit tests, some large tests are failing today: Failed tests: testBackgroundEvictionThread[1](org.apache.hadoop.hbase.io.hfile .TestLruBlockCache) testShutdownFixupWhenDaughterHasSplit(org.apache.hadoop.hbase.regionserver.Tes tSplitTransactionOnCluster): expected:<1> but was:<0> queueFailover(org.apache.hadoop.hbase.replication.TestReplication): Waited too much time for queueFailover replication testMergeTool(org.apache.hadoop.hbase.util.TestMergeTool): 'merging regions 0 and 1' failed with errCode -1 Tests in error: test3686a(org.apache.hadoop.hbase.client.TestScannerTimeout): 64142ms passed s ince the last invocation, timeout is currently set to 10000 For some of them it could be random and unrelated, but I am sure there are some real issues for of them. I am on vacations next week; so I will come back to it the week after...
          Hide
          Nicolas Liochon added a comment -

          Moreover, v9 patch work on trunk as of today.

          Show
          Nicolas Liochon added a comment - Moreover, v9 patch work on trunk as of today.
          Hide
          Lars Hofhansl added a comment -

          @Nicolas: Could you upload the patch to RB? That would make it easier to review.
          I'm loving all changes discussed here so far

          Show
          Lars Hofhansl added a comment - @Nicolas: Could you upload the patch to RB? That would make it easier to review. I'm loving all changes discussed here so far
          Hide
          Nicolas Liochon added a comment -

          @Lars and all: done see https://reviews.apache.org/r/3967/
          I sent it to the whole hbase group, I hope it's the right thing to do.

          I'm on vacation this week so I will see the comments after the 27th...

          Show
          Nicolas Liochon added a comment - @Lars and all: done see https://reviews.apache.org/r/3967/ I sent it to the whole hbase group, I hope it's the right thing to do. I'm on vacation this week so I will see the comments after the 27th...
          Hide
          stack added a comment -

          @N Have a good holiday. When you get back, this seems good to do... can do in another issue:

          remove the clusterIdWatcher class
          replace the conf parameter "hbase.cluster.id" by a simple getter on Connection. That would allow to get the id only when asked, with a lazy initialization, hence connecting later to ZK.
          

          I also like the bit where we remove master location as required in conf – rather, the only required conf is zk ensemble and get the master there.

          On reusing HTable Callables doing master stuff, lets not pollute HTable with master-isms so on '...but it could be useful to be able to use master from the HTable callables?'... I'd say no.

          Show
          stack added a comment - @N Have a good holiday. When you get back, this seems good to do... can do in another issue: remove the clusterIdWatcher class replace the conf parameter "hbase.cluster.id" by a simple getter on Connection. That would allow to get the id only when asked, with a lazy initialization, hence connecting later to ZK. I also like the bit where we remove master location as required in conf – rather, the only required conf is zk ensemble and get the master there. On reusing HTable Callables doing master stuff, lets not pollute HTable with master-isms so on '...but it could be useful to be able to use master from the HTable callables?'... I'd say no.
          Hide
          stack added a comment -

          Another thought:

          Do we have to have the getSharedZookeeperWatcher and releaseSharedZookeeperWatcher and getSharedMaster, etc., in the HConnection API? Are these not implementation details? (Or would it be too hard to undo them – you'd have not way of counting zk and master connections?)

          Show
          stack added a comment - Another thought: Do we have to have the getSharedZookeeperWatcher and releaseSharedZookeeperWatcher and getSharedMaster, etc., in the HConnection API? Are these not implementation details? (Or would it be too hard to undo them – you'd have not way of counting zk and master connections?)
          Hide
          Nicolas Liochon added a comment -

          Do we have to have the getSharedZookeeperWatcher and releaseSharedZookeeperWatcher and getSharedMaster, etc., in the HConnection API? Are these not implementation details? (Or would it be too hard to undo them – you'd have not way of counting zk and master connections?)

          Yes, we need the user to explicitly release the connection, as we can't hide that we managing the object life cycle. I would prefer to use "close" for this, but I didn't find an easy way to extend the master proxy to make it closeable. So I preferred to use "release" for both objects to make it consistent.

          Show
          Nicolas Liochon added a comment - Do we have to have the getSharedZookeeperWatcher and releaseSharedZookeeperWatcher and getSharedMaster, etc., in the HConnection API? Are these not implementation details? (Or would it be too hard to undo them – you'd have not way of counting zk and master connections?) Yes, we need the user to explicitly release the connection, as we can't hide that we managing the object life cycle. I would prefer to use "close" for this, but I didn't find an easy way to extend the master proxy to make it closeable. So I preferred to use "release" for both objects to make it consistent.
          Hide
          stack added a comment -

          ...but I didn't find an easy way to extend the master proxy to make it closeable

          What is the issue w/ the above? (I wonder why its hard to do?)

          Show
          stack added a comment - ...but I didn't find an easy way to extend the master proxy to make it closeable What is the issue w/ the above? (I wonder why its hard to do?)
          Hide
          stack added a comment -

          Ditto w/ zk? Can't we just add close to the HConnection Interface and it will decrement the ref counting?

          Show
          stack added a comment - Ditto w/ zk? Can't we just add close to the HConnection Interface and it will decrement the ref counting?
          Hide
          Nicolas Liochon added a comment -

          yep, for zk it easy.
          For HMasterInterface, I don't know: I need to modify the interface but also HBaseRPC.getProxy and then VersionedProtocol and so on, no?

          Show
          Nicolas Liochon added a comment - yep, for zk it easy. For HMasterInterface, I don't know: I need to modify the interface but also HBaseRPC.getProxy and then VersionedProtocol and so on, no?
          Hide
          stack added a comment -

          For HMasterInterface, I don't know: I need to modify the interface but also HBaseRPC.getProxy and then VersionedProtocol and so on, no?

          .... to add the close? (I am not following closely but would like to understand if possible so throw me a clue or two on what issue is). Thanks N.

          Show
          stack added a comment - For HMasterInterface, I don't know: I need to modify the interface but also HBaseRPC.getProxy and then VersionedProtocol and so on, no? .... to add the close? (I am not following closely but would like to understand if possible so throw me a clue or two on what issue is). Thanks N.
          Hide
          Nicolas Liochon added a comment -

          yes
          The user would code something like:

              HMasterInterface master = connection.getSharedMaster();
              try {
                master.move(encodedRegionName, destServerName);
              } finally {
                master.close();
              }
          

          HMasterInterface is an interface, with a proxy sending the calls to the master server.
          I would need to add a "close" that would not be a remote call but would decrement a reference counter.

          I could add another proxy object, but it's not very clean (it should work however, may be it's an option).
          If I don't want to do that, I need to add the method in the object returned by getProxy. You think it makes sense?

          Show
          Nicolas Liochon added a comment - yes The user would code something like: HMasterInterface master = connection.getSharedMaster(); try { master.move(encodedRegionName, destServerName); } finally { master.close(); } HMasterInterface is an interface, with a proxy sending the calls to the master server. I would need to add a "close" that would not be a remote call but would decrement a reference counter. I could add another proxy object, but it's not very clean (it should work however, may be it's an option). If I don't want to do that, I need to add the method in the object returned by getProxy. You think it makes sense?
          Hide
          stack added a comment -

          .. If I don't want to do that, I need to add the method in the object returned by getProxy. You think it makes sense?

          How would that work (I've wanted to add a method to the returned proxy in the past). Would you have returned proxy implement another Interface (That sounds hard). Make the returned Interface implement Closeable? Or, even, whats wrong w/ the close going remote? Maybe there are resources master-side to clean up (if not now, maybe one day?... though yeah, if client doesn't have to make the RPC, lets not bother if possible). Sounds like something to try and figure – if possible (Of course I've no ideas?)

          BTW, what you have above for conenction w/ try/finally looks ideal

          Show
          stack added a comment - .. If I don't want to do that, I need to add the method in the object returned by getProxy. You think it makes sense? How would that work (I've wanted to add a method to the returned proxy in the past). Would you have returned proxy implement another Interface (That sounds hard). Make the returned Interface implement Closeable? Or, even, whats wrong w/ the close going remote? Maybe there are resources master-side to clean up (if not now, maybe one day?... though yeah, if client doesn't have to make the RPC, lets not bother if possible). Sounds like something to try and figure – if possible (Of course I've no ideas?) BTW, what you have above for conenction w/ try/finally looks ideal
          Hide
          Nicolas Liochon added a comment -

          If we really want it, I found 3 options, and tried 2.

          1) Adding 'close' to the HMasterInterface
          After looking at it, I don't think it's a good option: HMasterInterface is an interface shared between the client & the server. So adding a close function to it would mean the server must implement it, while it's a client side function only. I believe that's the reason why there is already a function 'stopProxy' in the RPCENgine instead of a close function.

          2) Adding the possibility to have a delayed close in RPCENgine
          Instead of doing it for HMasterInterface in Connection only, we could do it all proxies and code this in RPCENgine.
          There is already a reference counting in the hbase RPCENgine. So we could add here some code to allow a delayed close. I don't see why it would not be possible, all the code seems to be in the hbase package (and not hadoop). This would require smart convention to make it configurable on a per proxy basis, but it should work.

          3) Add an class with a delegation
          So I've got this

          public interface SharedMaster extends HMasterInterface, Closeable {}
          

          With this in HConnection

          public interface HConnection extends Abortable, Closeable {
            public SharedMaster  getSharedMaster()
          }
          

          Then the client writes

          SharedMaster  master = connection.getSharedMaster();
          try {
             master.move(encodedRegionName, destServerName);
          } finally {
             master.close();
          }
          

          With a java proxy to manage the delegation for us:

              private static class SharedMasterHandler implements InvocationHandler {
                 private HConnectionImplementation connection;
                 private HMasterInterface master;
                SharedMasterHandler(HConnectionImplementation connection, HMasterInterface master){
                  this.connection = connection;
                  this.master = master;
                }
          
                @Override
                public Object invoke(Object proxy, Method method, Object[] args) throws Throwable {
                  if (method.getName().equals("close")){
                    connection.releaseSharedMaster((HMasterInterface)master);
                    return null;
                  } else {
                    return method.invoke(master, args);
                  }
                }
              }
          

          releaseSharedMaster is private in this solution.

          It was not really my first idea, but it's a reasonable way to get to the objective. The reflective delegation is not fast, obviously, but it doesn't matter here as there is much more expensive remote call just after...

          I'am currently testing it, it seems to work.

          Show
          Nicolas Liochon added a comment - If we really want it, I found 3 options, and tried 2. 1) Adding 'close' to the HMasterInterface After looking at it, I don't think it's a good option: HMasterInterface is an interface shared between the client & the server. So adding a close function to it would mean the server must implement it, while it's a client side function only. I believe that's the reason why there is already a function 'stopProxy' in the RPCENgine instead of a close function. 2) Adding the possibility to have a delayed close in RPCENgine Instead of doing it for HMasterInterface in Connection only, we could do it all proxies and code this in RPCENgine. There is already a reference counting in the hbase RPCENgine. So we could add here some code to allow a delayed close. I don't see why it would not be possible, all the code seems to be in the hbase package (and not hadoop). This would require smart convention to make it configurable on a per proxy basis, but it should work. 3) Add an class with a delegation So I've got this public interface SharedMaster extends HMasterInterface, Closeable {} With this in HConnection public interface HConnection extends Abortable, Closeable { public SharedMaster getSharedMaster() } Then the client writes SharedMaster master = connection.getSharedMaster(); try { master.move(encodedRegionName, destServerName); } finally { master.close(); } With a java proxy to manage the delegation for us: private static class SharedMasterHandler implements InvocationHandler { private HConnectionImplementation connection; private HMasterInterface master; SharedMasterHandler(HConnectionImplementation connection, HMasterInterface master){ this.connection = connection; this.master = master; } @Override public Object invoke(Object proxy, Method method, Object[] args) throws Throwable { if (method.getName().equals("close")){ connection.releaseSharedMaster((HMasterInterface)master); return null; } else { return method.invoke(master, args); } } } releaseSharedMaster is private in this solution. It was not really my first idea, but it's a reasonable way to get to the objective. The reflective delegation is not fast, obviously, but it doesn't matter here as there is much more expensive remote call just after... I'am currently testing it, it seems to work.
          Hide
          stack added a comment -

          On 1., yeah, the close should close the connection – a client-side thing....

          On 2., not so mad about it.

          On 3., you obtain the objective it seems but the solution does seem convoluted (more indirection in the client makes it yet more obtuse). Put up a patch I'd say. Lets have a look. SharedMaster is probably not the right name for the Interface? CloseableMaster or MasterConnection and doc that the close applies to the closing of the client connection to master only.

          Good on you N

          Show
          stack added a comment - On 1., yeah, the close should close the connection – a client-side thing.... On 2., not so mad about it. On 3., you obtain the objective it seems but the solution does seem convoluted (more indirection in the client makes it yet more obtuse). Put up a patch I'd say. Lets have a look. SharedMaster is probably not the right name for the Interface? CloseableMaster or MasterConnection and doc that the close applies to the closing of the client connection to master only. Good on you N
          Hide
          Nicolas Liochon added a comment -

          Yes, that's why I rejected it initially. But I can't find a better one.
          Moreover, I can't share the code with ZooKeeperWatcher, but they can have
          the same interface. I can simplify the internal code, but the indirection
          will remain.

          On Tue, Feb 28, 2012 at 8:27 PM, stack (Commented) (JIRA)

          Show
          Nicolas Liochon added a comment - Yes, that's why I rejected it initially. But I can't find a better one. Moreover, I can't share the code with ZooKeeperWatcher, but they can have the same interface. I can simplify the internal code, but the indirection will remain. On Tue, Feb 28, 2012 at 8:27 PM, stack (Commented) (JIRA)
          Hide
          Nicolas Liochon added a comment -

          v16 (still in progress, some unit tests fail, indent & comments to redo & so on), after a discussion with Stack.

          HConnection is a connection to the cluster.
          However, the fact that the cluster is composed of master, zookeeper and region servers should be hidden from the client: some functions need a master some others not. This is not the client problem. Especially, these functions can move (from example, getting a table descriptor, today a master function, could become a region server function). So the getMaster & getZookeeperWatcher, shared or not, should not be in HConnection interface.

          Client functions are today split in two classes HBaseAdmin & HConnection (with stuff like processBatch, listTables, getHTableDescriptor). It could make sense to split HConnection further, but anyway we already have two classes using master, and the master connection should remain shared between these two classes. This should be handled by the HConnection as it is its core responsibility and it's as well much simpler technically. So we need to have package visible function to get them for HBaseAdmin. I prefer to have them in HConnectionImplementation only, even it it implies a cast in HBaseAdmin, as this makes HConnection clean from a client point of view.

          We stick to the keep alive mechanism with the Closeable, and hence a dynamic proxy for master and a subclass for ZooKeeperMaster.

          Note that if master & zookeeper are implementation details, the same should apply to HRegionInterface (HConnection#getHRegionConnection). But there are dependencies from the master package so it can not be included in this jira. The keep alive mechanism could be applied to HRegionInterface as well.

          Show
          Nicolas Liochon added a comment - v16 (still in progress, some unit tests fail, indent & comments to redo & so on), after a discussion with Stack. HConnection is a connection to the cluster. However, the fact that the cluster is composed of master, zookeeper and region servers should be hidden from the client: some functions need a master some others not. This is not the client problem. Especially, these functions can move (from example, getting a table descriptor, today a master function, could become a region server function). So the getMaster & getZookeeperWatcher, shared or not, should not be in HConnection interface. Client functions are today split in two classes HBaseAdmin & HConnection (with stuff like processBatch, listTables, getHTableDescriptor). It could make sense to split HConnection further, but anyway we already have two classes using master, and the master connection should remain shared between these two classes. This should be handled by the HConnection as it is its core responsibility and it's as well much simpler technically. So we need to have package visible function to get them for HBaseAdmin. I prefer to have them in HConnectionImplementation only, even it it implies a cast in HBaseAdmin, as this makes HConnection clean from a client point of view. We stick to the keep alive mechanism with the Closeable, and hence a dynamic proxy for master and a subclass for ZooKeeperMaster. Note that if master & zookeeper are implementation details, the same should apply to HRegionInterface (HConnection#getHRegionConnection). But there are dependencies from the master package so it can not be included in this jira. The keep alive mechanism could be applied to HRegionInterface as well.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12516700/5399_inprogress.v18.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 18 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1070//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12516700/5399_inprogress.v18.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 18 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1070//console This message is automatically generated.
          Hide
          Nicolas Liochon added a comment -

          The patch is still in progress. Submitted to the build to see how it goes in hadoop-qa.

          Show
          Nicolas Liochon added a comment - The patch is still in progress. Submitted to the build to see how it goes in hadoop-qa.
          Hide
          stack added a comment -

          @N 1 out of 29 hunks FAILED – saving rejects to file src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java.rej

          Is it because I just committed fat 4403 patch?

          Show
          stack added a comment - @N 1 out of 29 hunks FAILED – saving rejects to file src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java.rej Is it because I just committed fat 4403 patch?
          Hide
          stack added a comment -

          @N Probably should post the patch up on reviewboard.. its certainly fat enough

          Show
          stack added a comment - @N Probably should post the patch up on reviewboard.. its certainly fat enough
          Hide
          Nicolas Liochon added a comment -
          Show
          Nicolas Liochon added a comment - @stack: done on https://reviews.apache.org/r/3967/
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12516702/5399_inprogress.v20.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 18 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          -1 javac. The patch appears to cause mvn compile goal to fail.

          -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1071//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1071//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12516702/5399_inprogress.v20.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 18 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The patch appears to cause mvn compile goal to fail. -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1071//testReport/ Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1071//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12516703/5399_inprogress.v21.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 18 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated -130 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 158 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.client.TestAdmin
          org.apache.hadoop.hbase.client.TestScannerTimeout
          org.apache.hadoop.hbase.replication.TestReplicationPeer
          org.apache.hadoop.hbase.util.TestMergeTool
          org.apache.hadoop.hbase.mapred.TestTableMapReduce
          org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
          org.apache.hadoop.hbase.master.TestSplitLogManager

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1072//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1072//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1072//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12516703/5399_inprogress.v21.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 18 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -130 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 158 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestAdmin org.apache.hadoop.hbase.client.TestScannerTimeout org.apache.hadoop.hbase.replication.TestReplicationPeer org.apache.hadoop.hbase.util.TestMergeTool org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat org.apache.hadoop.hbase.master.TestSplitLogManager Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1072//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1072//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1072//console This message is automatically generated.
          Hide
          Nicolas Liochon added a comment -

          fyi, here are the results on 4 runs.

          HADOOP-QA 1
          org.apache.hadoop.hbase.client.TestAdmin
          org.apache.hadoop.hbase.client.TestScannerTimeout
          org.apache.hadoop.hbase.replication.TestReplicationPeer
          org.apache.hadoop.hbase.util.TestMergeTool
          org.apache.hadoop.hbase.mapred.TestTableMapReduce
          org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
          org.apache.hadoop.hbase.master.TestSplitLogManager

          Local 1
          org.apache.hadoop.hbase.regionserver.TestAtomicOperation
          org.apache.hadoop.hbase.regionserver.wal.TestHLog
          org.apache.hadoop.hbase.master.TestDistributedLogSplitting
          org.apache.hadoop.hbase.replication.TestReplication
          org.apache.hadoop.hbase.util.TestMergeTool
          org.apache.hadoop.hbase.client.TestAdmin
          org.apache.hadoop.hbase.client.TestScannerTimeout

          Local 2
          org.apache.hadoop.hbase.regionserver.TestSplitLogWorker
          org.apache.hadoop.hbase.client.TestScannerTimeout
          org.apache.hadoop.hbase.util.TestMergeTool

          Local 3
          org.apache.hadoop.hbase.coprocessor.TestClassLoading
          org.apache.hadoop.hbase.master.TestSplitLogManager
          org.apache.hadoop.hbase.util.TestMergeTool
          org.apache.hadoop.hbase.coprocessor.TestAggregateProtocol
          org.apache.hadoop.hbase.io.encoding.TestChangingEncoding
          org.apache.hadoop.hbase.TestZooKeeper
          org.apache.hadoop.hbase.client.TestScannerTimeout
          org.apache.hadoop.hbase.replication.TestReplication

          Show
          Nicolas Liochon added a comment - fyi, here are the results on 4 runs. HADOOP-QA 1 org.apache.hadoop.hbase.client.TestAdmin org.apache.hadoop.hbase.client.TestScannerTimeout org.apache.hadoop.hbase.replication.TestReplicationPeer org.apache.hadoop.hbase.util.TestMergeTool org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat org.apache.hadoop.hbase.master.TestSplitLogManager Local 1 org.apache.hadoop.hbase.regionserver.TestAtomicOperation org.apache.hadoop.hbase.regionserver.wal.TestHLog org.apache.hadoop.hbase.master.TestDistributedLogSplitting org.apache.hadoop.hbase.replication.TestReplication org.apache.hadoop.hbase.util.TestMergeTool org.apache.hadoop.hbase.client.TestAdmin org.apache.hadoop.hbase.client.TestScannerTimeout Local 2 org.apache.hadoop.hbase.regionserver.TestSplitLogWorker org.apache.hadoop.hbase.client.TestScannerTimeout org.apache.hadoop.hbase.util.TestMergeTool Local 3 org.apache.hadoop.hbase.coprocessor.TestClassLoading org.apache.hadoop.hbase.master.TestSplitLogManager org.apache.hadoop.hbase.util.TestMergeTool org.apache.hadoop.hbase.coprocessor.TestAggregateProtocol org.apache.hadoop.hbase.io.encoding.TestChangingEncoding org.apache.hadoop.hbase.TestZooKeeper org.apache.hadoop.hbase.client.TestScannerTimeout org.apache.hadoop.hbase.replication.TestReplication
          Hide
          stack added a comment -

          Thats a wide variety in the types of failures. You get same kind of variance absent your patch N?

          Show
          stack added a comment - Thats a wide variety in the types of failures. You get same kind of variance absent your patch N?
          Hide
          Nicolas Liochon added a comment -

          @stack. A lot of variance, but not that much. So I know I broke something somewhere. I fixed a synchronization issue in v23 (plus the points mentioned in you review).

          Show
          Nicolas Liochon added a comment - @stack. A lot of variance, but not that much. So I know I broke something somewhere. I fixed a synchronization issue in v23 (plus the points mentioned in you review).
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12516840/5399_inprogress.v23.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 18 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1081//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12516840/5399_inprogress.v23.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 18 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1081//console This message is automatically generated.
          Hide
          Ted Yu added a comment -

          @N:
          Can you update the patch on review board ?
          It is 6 rev's behind.

          Thanks

          Show
          Ted Yu added a comment - @N: Can you update the patch on review board ? It is 6 rev's behind. Thanks
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12516907/5399.v27.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 18 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated -129 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 155 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.TestZooKeeper
          org.apache.hadoop.hbase.util.TestMergeTool

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1083//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1083//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1083//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12516907/5399.v27.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 18 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -129 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 155 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.TestZooKeeper org.apache.hadoop.hbase.util.TestMergeTool Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1083//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1083//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1083//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12517002/5399_inprogress.v32.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 30 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1088//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12517002/5399_inprogress.v32.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 30 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1088//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12517013/5399.v38.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 30 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          -1 javac. The patch appears to cause mvn compile goal to fail.

          -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1095//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1095//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12517013/5399.v38.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 30 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The patch appears to cause mvn compile goal to fail. -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1095//testReport/ Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1095//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12517015/5399.v39.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 30 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated -129 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 155 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.TestZooKeeper
          org.apache.hadoop.hbase.TestRegionRebalancing

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1097//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1097//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1097//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12517015/5399.v39.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 30 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -129 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 155 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.TestZooKeeper org.apache.hadoop.hbase.TestRegionRebalancing Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1097//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1097//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1097//console This message is automatically generated.
          Hide
          Nicolas Liochon added a comment -

          org.apache.hadoop.hbase.TestZooKeeper is surprising, because:

          • if we add a 7s sleep before in testMasterSessionExpired(), then it's much more difficult to reproduce.
          • in RecovableZooKeeper, there is no tests for SESSIONEXPIRED: if it happens there is no retry.

          So I will tend to think it's an existing issue, even if I need to understand how it's supposed to work when there is a session timeout. I tried to add it but it does not work.

          Show
          Nicolas Liochon added a comment - org.apache.hadoop.hbase.TestZooKeeper is surprising, because: if we add a 7s sleep before in testMasterSessionExpired(), then it's much more difficult to reproduce. in RecovableZooKeeper, there is no tests for SESSIONEXPIRED: if it happens there is no retry. So I will tend to think it's an existing issue, even if I need to understand how it's supposed to work when there is a session timeout. I tried to add it but it does not work.
          Hide
          Nicolas Liochon added a comment -

          TestRegionRebalancing: seems to be a flaky test. Will retry on Hadoop-QA, but I don't reproduce it here.
          TestRegionRebalancing: With the 7s sleep (i.e. same sleep as before), I don't reproduce it. I will try to understand why this sleep changes the result, but anyway it's not a regression.

          So this patch is a good candidate for a commit I think. Further enhancement (clusterId, ZK watcher replacement by simple calls) could be put in another JIRA.

          Show
          Nicolas Liochon added a comment - TestRegionRebalancing: seems to be a flaky test. Will retry on Hadoop-QA, but I don't reproduce it here. TestRegionRebalancing: With the 7s sleep (i.e. same sleep as before), I don't reproduce it. I will try to understand why this sleep changes the result, but anyway it's not a regression. So this patch is a good candidate for a commit I think. Further enhancement (clusterId, ZK watcher replacement by simple calls) could be put in another JIRA.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12517212/5399.v40.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 30 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated -129 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 155 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1114//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1114//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1114//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12517212/5399.v40.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 30 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -129 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 155 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1114//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1114//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1114//console This message is automatically generated.
          Hide
          Lars Hofhansl added a comment -

          Too big for 0.94. Agreed?

          Show
          Lars Hofhansl added a comment - Too big for 0.94. Agreed?
          Hide
          Ted Yu added a comment -

          Should the following be modified to reflect the current target ?

          +   *  Deprecated in february 2012, targeted release: 0.94
          
          Show
          Ted Yu added a comment - Should the following be modified to reflect the current target ? + * Deprecated in february 2012, targeted release: 0.94
          Hide
          Ted Yu added a comment -
          +    // We put that all the possible setting to make it fails asap
          

          should read 'put all the possible settings to make it fail'.

          Please remove extra empty lines in checkHBaseAvailable().

          In HConnectionManager.java, can the java imports be moved back to top of file ?

          +    // We have a single lock for master & zk to prevents deadlocks. Having
          

          should read 'to prevent'.

          Please use spaces around '+' below:

          +      return "hconnection "+hashCode();
          
          Show
          Ted Yu added a comment - + // We put that all the possible setting to make it fails asap should read 'put all the possible settings to make it fail'. Please remove extra empty lines in checkHBaseAvailable(). In HConnectionManager.java, can the java imports be moved back to top of file ? + // We have a single lock for master & zk to prevents deadlocks. Having should read 'to prevent'. Please use spaces around '+' below: + return "hconnection " +hashCode();
          Hide
          stack added a comment -

          @LarsH I think its too radical a change in client behavior for 0.94. If we target it for 0.96, it'll be a ripple only compared to rpc changes; it won't be noticed.

          Show
          stack added a comment - @LarsH I think its too radical a change in client behavior for 0.94. If we target it for 0.96, it'll be a ripple only compared to rpc changes; it won't be noticed.
          Hide
          Nicolas Liochon added a comment -

          @all; Ok, I will provide an updated patch with the comments taken into account. I also understood the issue behind TestZooKeeper. My fix currently breaks other unit tests, but if I manage to make it work I will include it.

          For 0.94 vs. 0.96 ok as well, however, it would makes sense to deprecate some methods in the 0.94, especially getMaster & getZooKeeper in HConnection: people will get more time to react this way.

          Show
          Nicolas Liochon added a comment - @all; Ok, I will provide an updated patch with the comments taken into account. I also understood the issue behind TestZooKeeper. My fix currently breaks other unit tests, but if I manage to make it work I will include it. For 0.94 vs. 0.96 ok as well, however, it would makes sense to deprecate some methods in the 0.94, especially getMaster & getZooKeeper in HConnection: people will get more time to react this way.
          Hide
          stack added a comment -

          @nkeywal yes, agree, good to deprecate in 0.94 rather than 0.96 so more time to move off the old methods

          Show
          stack added a comment - @nkeywal yes, agree, good to deprecate in 0.94 rather than 0.96 so more time to move off the old methods
          Hide
          stack added a comment -

          ... so it seems like there needs to be a separate patch for 0.94?

          Show
          stack added a comment - ... so it seems like there needs to be a separate patch for 0.94?
          Hide
          Nicolas Liochon added a comment -

          I can do a patch with only the deprecated stuff, or it can be be added in a
          raw commit, as you like.

          On Tue, Mar 6, 2012 at 10:03 PM, stack (Commented) (JIRA)

          Show
          Nicolas Liochon added a comment - I can do a patch with only the deprecated stuff, or it can be be added in a raw commit, as you like. On Tue, Mar 6, 2012 at 10:03 PM, stack (Commented) (JIRA)
          Hide
          Lars Hofhansl added a comment -

          Let's file a child jira for the 0.94 deprecation.

          Show
          Lars Hofhansl added a comment - Let's file a child jira for the 0.94 deprecation.
          Hide
          Nicolas Liochon added a comment -

          This version integrates the last comments + the trunk. Locally, I've got random failures I didn't get 3 days ago. So ley me confirm before committing.

          Show
          Nicolas Liochon added a comment - This version integrates the last comments + the trunk. Locally, I've got random failures I didn't get 3 days ago. So ley me confirm before committing.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12517728/5399.v41.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 30 new or modified tests.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1143//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12517728/5399.v41.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 30 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1143//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12517731/5399.v42.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 30 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated -123 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 160 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol
          org.apache.hadoop.hbase.mapreduce.TestImportTsv
          org.apache.hadoop.hbase.mapred.TestTableMapReduce
          org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1144//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1144//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1144//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12517731/5399.v42.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 30 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -123 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 160 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol org.apache.hadoop.hbase.mapreduce.TestImportTsv org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1144//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1144//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1144//console This message is automatically generated.
          Hide
          Nicolas Liochon added a comment -

          They're all different from the ones I got locally. It could be pure test flakiness. Let's retry.

          Show
          Nicolas Liochon added a comment - They're all different from the ones I got locally. It could be pure test flakiness. Let's retry.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12517743/5399.v42.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 30 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated -123 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 160 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.mapreduce.TestImportTsv
          org.apache.hadoop.hbase.mapred.TestTableMapReduce
          org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1147//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1147//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1147//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12517743/5399.v42.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 30 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -123 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 160 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.mapreduce.TestImportTsv org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1147//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1147//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1147//console This message is automatically generated.
          Hide
          stack added a comment -

          That looks ok N? Will I commit?

          Show
          stack added a comment - That looks ok N? Will I commit?
          Hide
          Nicolas Liochon added a comment -

          I don't know. The errors are the same, but I don't reproduce them locally.
          Let's retry, if we have them a third time I will look at them tomorrow or Monday.

          Show
          Nicolas Liochon added a comment - I don't know. The errors are the same, but I don't reproduce them locally. Let's retry, if we have them a third time I will look at them tomorrow or Monday.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12517856/5399.v42.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 30 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated -123 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 160 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.regionserver.TestColumnSeeking

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1157//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1157//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1157//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12517856/5399.v42.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 30 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -123 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 160 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestColumnSeeking Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1157//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1157//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1157//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12517858/5399.v42.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 30 new or modified tests.

          -1 javadoc. The javadoc tool appears to have generated -123 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 160 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.master.TestDistributedLogSplitting
          org.apache.hadoop.hbase.mapreduce.TestImportTsv
          org.apache.hadoop.hbase.mapred.TestTableMapReduce
          org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1158//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1158//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1158//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12517858/5399.v42.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 30 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -123 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 160 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.master.TestDistributedLogSplitting org.apache.hadoop.hbase.mapreduce.TestImportTsv org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1158//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1158//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1158//console This message is automatically generated.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12517999/nochange.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          -1 javadoc. The javadoc tool appears to have generated -123 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 159 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
          org.apache.hadoop.hbase.client.TestAdmin
          org.apache.hadoop.hbase.mapred.TestTableMapReduce
          org.apache.hadoop.hbase.mapreduce.TestImportTsv

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1162//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1162//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1162//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12517999/nochange.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -123 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 159 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat org.apache.hadoop.hbase.client.TestAdmin org.apache.hadoop.hbase.mapred.TestTableMapReduce org.apache.hadoop.hbase.mapreduce.TestImportTsv Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1162//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1162//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1162//console This message is automatically generated.
          Hide
          Nicolas Liochon added a comment -

          Ok, even an empty patch gets these errors from hadoop-qa, so the v42 can be committed imho.

          Show
          Nicolas Liochon added a comment - Ok, even an empty patch gets these errors from hadoop-qa, so the v42 can be committed imho.
          Hide
          stack added a comment -

          I tried it locally and saw some flakeyness in the medium test runs... They don't seem to pass reliably anymore but its with or without this patch. I'm committing to trunk before this fat patch rots. We can come back to work on the flakey tests after it goes in. Thanks Nicolas for the nice patch. Applied to trunk.

          Show
          stack added a comment - I tried it locally and saw some flakeyness in the medium test runs... They don't seem to pass reliably anymore but its with or without this patch. I'm committing to trunk before this fat patch rots. We can come back to work on the flakey tests after it goes in. Thanks Nicolas for the nice patch. Applied to trunk.
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK #2676 (See https://builds.apache.org/job/HBase-TRUNK/2676/)
          HBASE-5399 Cut the link between the client and the zookeeper ensemble (Revision 1299872)

          Result = FAILURE
          stack :
          Files :

          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/MasterNotRunningException.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnection.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MasterKeepAliveConnection.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/ZooKeeperKeepAliveConnection.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/replication/VerifyReplication.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/replication/ReplicationPeer.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeper.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/Merge.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/RecoverableZooKeeper.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/RootRegionTracker.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperNodeTracker.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestZooKeeper.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestDeadServer.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestRollingRestart.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationPeer.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/rest/client/TestRemoteTable.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/TestMergeTable.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/TestMergeTool.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK #2676 (See https://builds.apache.org/job/HBase-TRUNK/2676/ ) HBASE-5399 Cut the link between the client and the zookeeper ensemble (Revision 1299872) Result = FAILURE stack : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/MasterNotRunningException.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnection.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MasterKeepAliveConnection.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/ZooKeeperKeepAliveConnection.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/replication/VerifyReplication.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/replication/ReplicationPeer.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeper.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/Merge.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/RecoverableZooKeeper.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/RootRegionTracker.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperNodeTracker.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestZooKeeper.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestDeadServer.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestRollingRestart.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationPeer.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/rest/client/TestRemoteTable.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/TestMergeTable.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/TestMergeTool.java
          Hide
          Ted Yu added a comment -

          TestAtomicOperation failed in latest TRUNK build:
          https://builds.apache.org/job/HBase-TRUNK/2676/testReport/org.apache.hadoop.hbase.regionserver/TestAtomicOperation/testMultiRowMutationMultiThreads/

          Similar failure shows up in the latest Hadoop QA run of HBASE-5542

          Show
          Ted Yu added a comment - TestAtomicOperation failed in latest TRUNK build: https://builds.apache.org/job/HBase-TRUNK/2676/testReport/org.apache.hadoop.hbase.regionserver/TestAtomicOperation/testMultiRowMutationMultiThreads/ Similar failure shows up in the latest Hadoop QA run of HBASE-5542
          Hide
          Ted Yu added a comment -

          From test output:

          Exception in thread "Thread-211" junit.framework.AssertionFailedError	at junit.framework.Assert.fail(Assert.java:48)
          	at junit.framework.Assert.fail(Assert.java:56)
          	at org.apache.hadoop.hbase.regionserver.TestAtomicOperation$2.run(TestAtomicOperation.java:392)
          

          Here is related code in test:

                        if (r.size() != 1) {
                          LOG.debug(r);
                          failures.incrementAndGet();
                          fail();
                        }
          
          Show
          Ted Yu added a comment - From test output: Exception in thread " Thread -211" junit.framework.AssertionFailedError at junit.framework.Assert.fail(Assert.java:48) at junit.framework.Assert.fail(Assert.java:56) at org.apache.hadoop.hbase.regionserver.TestAtomicOperation$2.run(TestAtomicOperation.java:392) Here is related code in test: if (r.size() != 1) { LOG.debug(r); failures.incrementAndGet(); fail(); }
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK-security #135 (See https://builds.apache.org/job/HBase-TRUNK-security/135/)
          HBASE-5399 Cut the link between the client and the zookeeper ensemble (Revision 1299872)

          Result = FAILURE
          stack :
          Files :

          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/MasterNotRunningException.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnection.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MasterKeepAliveConnection.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/ZooKeeperKeepAliveConnection.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/replication/VerifyReplication.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/replication/ReplicationPeer.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeper.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/Merge.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/RecoverableZooKeeper.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/RootRegionTracker.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperNodeTracker.java
          • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestZooKeeper.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestDeadServer.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestRollingRestart.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationPeer.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/rest/client/TestRemoteTable.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/TestMergeTable.java
          • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/TestMergeTool.java
          Show
          Hudson added a comment - Integrated in HBase-TRUNK-security #135 (See https://builds.apache.org/job/HBase-TRUNK-security/135/ ) HBASE-5399 Cut the link between the client and the zookeeper ensemble (Revision 1299872) Result = FAILURE stack : Files : /hbase/trunk/src/main/java/org/apache/hadoop/hbase/MasterNotRunningException.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnection.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MasterKeepAliveConnection.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/ZooKeeperKeepAliveConnection.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/replication/VerifyReplication.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/replication/ReplicationPeer.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeper.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/Merge.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/RecoverableZooKeeper.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/RootRegionTracker.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperNodeTracker.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestZooKeeper.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestDeadServer.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestRollingRestart.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationPeer.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/rest/client/TestRemoteTable.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/TestMergeTable.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/util/TestMergeTool.java
          Hide
          Lars Hofhansl added a comment -

          Yeah!

          Show
          Lars Hofhansl added a comment - Yeah!
          Hide
          Lars Hofhansl added a comment -

          I should look at TestAtomicOperation.

          Show
          Lars Hofhansl added a comment - I should look at TestAtomicOperation.
          Hide
          stack added a comment -

          @Nicolas What you make of Ted's comment above? You think it related in anyway? I'd doubt it is since this is client-side hackery, nought to do w/ serverside row level commits. Its just suspicious because build broke on TestAtomicOperation when this issue went in (But a bunch of others went in at same time...)

          Show
          stack added a comment - @Nicolas What you make of Ted's comment above? You think it related in anyway? I'd doubt it is since this is client-side hackery, nought to do w/ serverside row level commits. Its just suspicious because build broke on TestAtomicOperation when this issue went in (But a bunch of others went in at same time...)
          Hide
          stack added a comment -

          @Nicolas What you make of Ted's comment above? You think it related in anyway? I'd doubt it is since this is client-side hackery, nought to do w/ serverside row level commits. Its just suspicious because build broke on TestAtomicOperation when this issue went in (But a bunch of others went in at same time...)

          Show
          stack added a comment - @Nicolas What you make of Ted's comment above? You think it related in anyway? I'd doubt it is since this is client-side hackery, nought to do w/ serverside row level commits. Its just suspicious because build broke on TestAtomicOperation when this issue went in (But a bunch of others went in at same time...)
          Hide
          Nicolas Liochon added a comment -

          @stack
          Yes, this test is flaky... I reproduce the error on the trunk as of March 10th as well. I've seen it failing previously, I think it's flaky for at the very least a month (and may be much more)

          git log:

          commit 0f3e025a62f89763fffbf8298d565a6c4e5b7d06
          Date:   Sat Mar 10 02:27:05 2012 +0000
          

          With the same stack as in trunk #2676:

          Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 7.444 sec <<< FAILURE!
          testMultiRowMutationMultiThreads(org.apache.hadoop.hbase.regionserver.TestAtomicOperation)  Time elapsed: 7.083 sec  <<< FAILURE!
          junit.framework.AssertionFailedError: expected:<0> but was:<1>
          	at junit.framework.Assert.fail(Assert.java:50)
          	at junit.framework.Assert.failNotEquals(Assert.java:287)
          	at junit.framework.Assert.assertEquals(Assert.java:67)
          	at junit.framework.Assert.assertEquals(Assert.java:199)
          	at junit.framework.Assert.assertEquals(Assert.java:205)
          	at org.apache.hadoop.hbase.regionserver.TestAtomicOperation.testMultiRowMutationMultiThreads(TestAtomicOperation.java:416)
          
          Show
          Nicolas Liochon added a comment - @stack Yes, this test is flaky... I reproduce the error on the trunk as of March 10th as well. I've seen it failing previously, I think it's flaky for at the very least a month (and may be much more) git log: commit 0f3e025a62f89763fffbf8298d565a6c4e5b7d06 Date: Sat Mar 10 02:27:05 2012 +0000 With the same stack as in trunk #2676: Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 7.444 sec <<< FAILURE! testMultiRowMutationMultiThreads(org.apache.hadoop.hbase.regionserver.TestAtomicOperation) Time elapsed: 7.083 sec <<< FAILURE! junit.framework.AssertionFailedError: expected:<0> but was:<1> at junit.framework.Assert.fail(Assert.java:50) at junit.framework.Assert.failNotEquals(Assert.java:287) at junit.framework.Assert.assertEquals(Assert.java:67) at junit.framework.Assert.assertEquals(Assert.java:199) at junit.framework.Assert.assertEquals(Assert.java:205) at org.apache.hadoop.hbase.regionserver.TestAtomicOperation.testMultiRowMutationMultiThreads(TestAtomicOperation.java:416)
          Hide
          stack added a comment -

          Yes, this test is flaky... I reproduce the error on the trunk as of March 10th as well.

          How do you do this? You run the test multiple times?

          I think it's flaky for at the very least a month (and may be much more)

          In your estimation, we broke this a while back. Any clue what did it?

          Thanks.

          Show
          stack added a comment - Yes, this test is flaky... I reproduce the error on the trunk as of March 10th as well. How do you do this? You run the test multiple times? I think it's flaky for at the very least a month (and may be much more) In your estimation, we broke this a while back. Any clue what did it? Thanks.
          Hide
          Lars Hofhansl added a comment -

          Only testMultiRowMutationMultiThreads is failing, which I added recently.
          I now think the test always had this problem.

          Show
          Lars Hofhansl added a comment - Only testMultiRowMutationMultiThreads is failing, which I added recently. I now think the test always had this problem.
          Hide
          Nicolas Liochon added a comment -

          @Stack; I think it fails 20% of the time. I run it alone, i.e. with -Dtest=TestAtomicOperation#testMultiRowMutationMultiThreads with nothing else running on the machine, and a mvn clean. No clue on when it started to happen.

          @Lars: I'm not sure I haven't seen failures on testRowMutationMultiThreads as well, I will launch a few tests to see if it happens.

          Show
          Nicolas Liochon added a comment - @Stack; I think it fails 20% of the time. I run it alone, i.e. with -Dtest=TestAtomicOperation#testMultiRowMutationMultiThreads with nothing else running on the machine, and a mvn clean. No clue on when it started to happen. @Lars: I'm not sure I haven't seen failures on testRowMutationMultiThreads as well, I will launch a few tests to see if it happens.
          Hide
          Ted Yu added a comment -

          Lars has found some clue in HBASE-5569

          Show
          Ted Yu added a comment - Lars has found some clue in HBASE-5569
          Hide
          Nicolas Liochon added a comment -

          @Lars, Stack:
          After 50 tries, on trunk (fbd4bebd5cca129f49e91ec9936f604998a7025a) + 5572 I got it:

          testRowMutationMultiThreads(org.apache.hadoop.hbase.regionserver.TestAtomicOperation): expected:<0> but was:<5>
          at org.apache.hadoop.hbase.regionserver.TestAtomicOperation.testRowMutationMultiThreads(TestAtomicOperation.java:331)

          So the probability for testRowMutationMultiThreads is 10 times inferior than for testMultiRowMutationMultiThreads but it can occur as well..

          Show
          Nicolas Liochon added a comment - @Lars, Stack: After 50 tries, on trunk (fbd4bebd5cca129f49e91ec9936f604998a7025a) + 5572 I got it: testRowMutationMultiThreads(org.apache.hadoop.hbase.regionserver.TestAtomicOperation): expected:<0> but was:<5> at org.apache.hadoop.hbase.regionserver.TestAtomicOperation.testRowMutationMultiThreads(TestAtomicOperation.java:331) So the probability for testRowMutationMultiThreads is 10 times inferior than for testMultiRowMutationMultiThreads but it can occur as well..
          Hide
          Lars Hofhansl added a comment -

          Arrgghhh... That's not good!
          Do you see above message in that case in the logs?
          Do you still have the log output? (Feel free to send a zip via email or attach here).

          Show
          Lars Hofhansl added a comment - Arrgghhh... That's not good! Do you see above message in that case in the logs? Do you still have the log output? (Feel free to send a zip via email or attach here).
          Hide
          Nicolas Liochon added a comment -

          I'm going to rerun it on trunk and on a 2 weeks old trunk as well. Except if you think differently, I will put in HBASE-5569, it seems to be the best place.

          Show
          Nicolas Liochon added a comment - I'm going to rerun it on trunk and on a 2 weeks old trunk as well. Except if you think differently, I will put in HBASE-5569 , it seems to be the best place.
          Hide
          Lars Hofhansl added a comment -

          Yeah. Let's move the discussion to HBASE-5569.
          Thanks for doing this Nicolas (pardon me if I misspelled your name). You don't have to, though. Your change here is almost certainly not causing this problem.

          Show
          Lars Hofhansl added a comment - Yeah. Let's move the discussion to HBASE-5569 . Thanks for doing this Nicolas (pardon me if I misspelled your name). You don't have to, though. Your change here is almost certainly not causing this problem.
          Hide
          Nicolas Liochon added a comment -

          It's ok, we're all in the same boat
          I've got the test running on a 2 weeks old version of the trunk, I will
          have the result tomorrow.

          On Wed, Mar 14, 2012 at 12:12 AM, Lars Hofhansl (Commented) (JIRA) <

          Show
          Nicolas Liochon added a comment - It's ok, we're all in the same boat I've got the test running on a 2 weeks old version of the trunk, I will have the result tomorrow. On Wed, Mar 14, 2012 at 12:12 AM, Lars Hofhansl (Commented) (JIRA) <
          Hide
          Ted Yu added a comment -

          TestZooKeeper fails quite often on Hadoop QA now.
          Here is a recent one:
          https://issues.apache.org/jira/browse/HBASE-5542?focusedCommentId=13229008&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13229008

          When Ramkrishna worked on HBASE-5206:

          Tried out TestZookeeper.testClientSessionExpired. With or without patch it is failing.

          Show
          Ted Yu added a comment - TestZooKeeper fails quite often on Hadoop QA now. Here is a recent one: https://issues.apache.org/jira/browse/HBASE-5542?focusedCommentId=13229008&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13229008 When Ramkrishna worked on HBASE-5206 : Tried out TestZookeeper.testClientSessionExpired. With or without patch it is failing.
          Hide
          Ted Yu added a comment -

          TestZooKeeper.testClientSessionExpired fails on Hadoop QA after this patch went in.

          Show
          Ted Yu added a comment - TestZooKeeper.testClientSessionExpired fails on Hadoop QA after this patch went in.
          Hide
          Nicolas Liochon added a comment -

          You're right. It's a regression. I will fix this with 5549. I'm having a hard time trying to understand what's going on exactly here, and the "sleep" coding pattern does not help to say the least.

          Show
          Nicolas Liochon added a comment - You're right. It's a regression. I will fix this with 5549. I'm having a hard time trying to understand what's going on exactly here, and the "sleep" coding pattern does not help to say the least.
          Hide
          Nicolas Liochon added a comment -

          I've actually been able to reproduce the issue on a previous trunk version, so it's not linked to my changes, and that's a great step forward in term of rationality.

          Sometimes when we trigger a zk session expiry the zk connection remains in the "connected" state. I don't have any explication for this, but at least it's not a new behavior. I will manage this case explicitly in the version I'm writing for #5549.

          Show
          Nicolas Liochon added a comment - I've actually been able to reproduce the issue on a previous trunk version, so it's not linked to my changes, and that's a great step forward in term of rationality. Sometimes when we trigger a zk session expiry the zk connection remains in the "connected" state. I don't have any explication for this, but at least it's not a new behavior. I will manage this case explicitly in the version I'm writing for #5549.
          Hide
          stack added a comment -

          If you were able to trigger it in a version before your patch, I'd say reclose the issue N.

          Show
          stack added a comment - If you were able to trigger it in a version before your patch, I'd say reclose the issue N.
          Hide
          stack added a comment -

          I should have said, given you've repro'd in an older version – freeing up this commit of blame – AND that you are going to fix it anyways in 5549, I'd say its good to close this issue again. Good stuff.

          Show
          stack added a comment - I should have said, given you've repro'd in an older version – freeing up this commit of blame – AND that you are going to fix it anyways in 5549, I'd say its good to close this issue again. Good stuff.
          Hide
          stack added a comment -

          Marking closed.

          Show
          stack added a comment - Marking closed.

            People

            • Assignee:
              Nicolas Liochon
              Reporter:
              Nicolas Liochon
            • Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development