Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-7590

Add a costless notifications mechanism from master to regionservers & clients

    Details

    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      This allows to setup a multicast connection between the master and the hbase clients. With the feature on, when a regionserver is marked as dead by the master, the master sends as well a multicast message that will make the hbase client to disconnect immediately from the dead server instead of waiting for a socket timeout. Specifically, this allows to set hbase.rpc.timeout to larger values (like 5 minutes) without impacting the MTTR: without this, even if the dead regionserver data is now available on another server, the client stays on the dead one, waiting for an answer that will never come. It's a multicast message, hence cheap, scalable, but unreliable. For this reason, the master sends the information 5 times, to allow the hbase client to miss a message. This feature is NOT activated by default. To activate it, add to your hbase-site.xml:

        <property>
          <name>hbase.status.published</name>
          <value>true</value>
        </property>

      You can as well configure the ip address and port used with the following setting:
      <property>
      <name>hbase.status.multicast.address.ip</name>
      <value>226.1.1.3</value>
      </property>

      <property>
      <name>hbase.status.multicast.address.port</name>
      <value>6100</value>
      </property>
      Show
      This allows to setup a multicast connection between the master and the hbase clients. With the feature on, when a regionserver is marked as dead by the master, the master sends as well a multicast message that will make the hbase client to disconnect immediately from the dead server instead of waiting for a socket timeout. Specifically, this allows to set hbase.rpc.timeout to larger values (like 5 minutes) without impacting the MTTR: without this, even if the dead regionserver data is now available on another server, the client stays on the dead one, waiting for an answer that will never come. It's a multicast message, hence cheap, scalable, but unreliable. For this reason, the master sends the information 5 times, to allow the hbase client to miss a message. This feature is NOT activated by default. To activate it, add to your hbase-site.xml:   <property>     <name>hbase.status.published</name>     <value>true</value>   </property> You can as well configure the ip address and port used with the following setting: <property> <name>hbase.status.multicast.address.ip</name> <value>226.1.1.3</value> </property> <property> <name>hbase.status.multicast.address.port</name> <value>6100</value> </property>
    • Tags:
      0.96notable

      Description

      t would be very useful to add a mechanism to distribute some information to the clients and regionservers. Especially It would be useful to know globally (regionservers + clients apps) that some regionservers are dead. This would allow:

      • to lower the load on the system, without clients using staled information and going on dead machines
      • to make the recovery faster from a client point of view. It's common to use large timeouts on the client side, so the client may need a lot of time before declaring a region server dead and trying another one. If the client receives the information separatly about a region server states, it can take the right decision, and continue/stop to wait accordingly.

      We can also send more information, for example instructions like 'slow down' to instruct the client to increase the retries delay and so on.

      Technically, the master could send this information. To lower the load on the system, we should:

      • have a multicast communication (i.e. the master does not have to connect to all servers by tcp), with once packet every 10 seconds or so.
      • receivers should not depend on this: if the information is available great. If not, it should not break anything.
      • it should be optional.

      So at the end we would have a thread in the master sending a protobuf message about the dead servers on a multicast socket. If the socket is not configured, it does not do anything. On the client side, when we receive an information that a node is dead, we refresh the cache about it.

      1. 7590.v5.patch
        85 kB
        Nicolas Liochon
      2. 7590.v5.patch
        85 kB
        Nicolas Liochon
      3. 7590.v3.patch
        68 kB
        Nicolas Liochon
      4. 7590.v2.patch
        69 kB
        Nicolas Liochon
      5. 7590.v1-rebased.patch
        66 kB
        Nicolas Liochon
      6. 7590.v13.patch
        83 kB
        Nicolas Liochon
      7. 7590.v12.patch
        83 kB
        Nicolas Liochon
      8. 7590.v12.patch
        83 kB
        Nicolas Liochon
      9. 7590.v1.patch
        66 kB
        Nicolas Liochon
      10. 7590.inprogress.patch
        68 kB
        Nicolas Liochon

        Issue Links

          Activity

          Hide
          stack stack added a comment -

          Marking closed.

          Show
          stack stack added a comment - Marking closed.
          Hide
          nkeywal Nicolas Liochon added a comment -

          Inconsistency fixed in HBASE-9452, release note updated.

          Show
          nkeywal Nicolas Liochon added a comment - Inconsistency fixed in HBASE-9452 , release note updated.
          Hide
          jdcryans Jean-Daniel Cryans added a comment -

          I fixed the release note. It was not "MulticastListener" but "MultiCastListener" and "MulticastPublisher" does have a lower case. Consistency guys

          Show
          jdcryans Jean-Daniel Cryans added a comment - I fixed the release note. It was not "MulticastListener" but "MultiCastListener" and "MulticastPublisher" does have a lower case. Consistency guys
          Hide
          nkeywal Nicolas Liochon added a comment -

          Here is the proposal for the release note:

          This allows to setup a multicast connection between the master and the hbase clients. With the feature on, when a regionserver is marked as dead by the master, the master sends as well a multicast message that will make the hbase client to disconnect immediately from the dead server instead of waiting for a socket timeout. Specifically, this allows to set hbase.rpc.timeout to larger values (like 5 minutes) without impacting the MTTR: without this, even if the dead regionserver data is now available on another server, the client stays on the dead one, waiting for an answer that will never come. It's a multicast message, hence cheap, scalable, but unreliable. For this reason, the master sends the information 5 times, to allow the hbase client to miss a message. This feature is NOT activated by default. To activate it, add to your hbase-site.xml:
          <property>
          <name>hbase.status.publisher.class</name>
          <value>org.apache.hadoop.hbase.master.ClusterStatusPublisher$MulticastPublisher</value>
          </property>

          <property>
          <name>hbase.status.listener.class</name>
          <value>org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener</value>
          </property>

          You can as well configure the ip address and port used with the following setting:
          <property>
          <name>hbase.status.multicast.address.ip</name>
          <value>226.1.1.3</value>
          </property>

          <property>
          <name>hbase.status.multicast.address.port</name>
          <value>6100</value>
          </property>

          Show
          nkeywal Nicolas Liochon added a comment - Here is the proposal for the release note: This allows to setup a multicast connection between the master and the hbase clients. With the feature on, when a regionserver is marked as dead by the master, the master sends as well a multicast message that will make the hbase client to disconnect immediately from the dead server instead of waiting for a socket timeout. Specifically, this allows to set hbase.rpc.timeout to larger values (like 5 minutes) without impacting the MTTR: without this, even if the dead regionserver data is now available on another server, the client stays on the dead one, waiting for an answer that will never come. It's a multicast message, hence cheap, scalable, but unreliable. For this reason, the master sends the information 5 times, to allow the hbase client to miss a message. This feature is NOT activated by default. To activate it, add to your hbase-site.xml: <property> <name>hbase.status.publisher.class</name> <value>org.apache.hadoop.hbase.master.ClusterStatusPublisher$MulticastPublisher</value> </property> <property> <name>hbase.status.listener.class</name> <value>org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener</value> </property> You can as well configure the ip address and port used with the following setting: <property> <name>hbase.status.multicast.address.ip</name> <value>226.1.1.3</value> </property> <property> <name>hbase.status.multicast.address.port</name> <value>6100</value> </property>
          Hide
          nkeywal Nicolas Liochon added a comment -

          ok, will do (as for rb) by the end of this week.

          Show
          nkeywal Nicolas Liochon added a comment - ok, will do (as for rb) by the end of this week.
          Hide
          stack stack added a comment -

          Added comments up on rb. This issue needs a fat release note w/ what it does, and how to turn it on.

          Show
          stack stack added a comment - Added comments up on rb. This issue needs a fat release note w/ what it does, and how to turn it on.
          Hide
          hudson Hudson added a comment -

          Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #454 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/454/)
          HBASE-7590 Add a costless notifications mechanism from master to regionservers & clients - new files (Revision 1458199)
          HBASE-7590 Add a costless notifications mechanism from master to regionservers & clients (Revision 1458184)

          Result = FAILURE
          nkeywal :
          Files :

          • /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ClusterStatusListener.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/ClusterStatusPublisher.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestClusterStatusPublisher.java

          nkeywal :
          Files :

          • /hbase/trunk/hbase-client/pom.xml
          • /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/Chore.java
          • /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/ClusterStatus.java
          • /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
          • /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HConnection.java
          • /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
          • /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java
          • /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
          • /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClientRPC.java
          • /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/ProtobufRpcClientEngine.java
          • /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/RpcClientEngine.java
          • /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/RecoverableZooKeeper.java
          • /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/DeadServer.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMasterCommandLine.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreFlusher.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/thrift/IncrementCoalescer.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseCluster.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/TestMultiVersions.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestHCM.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestFilterWithScanLimits.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestFilterWrapper.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestTimeRangeMapRed.java
          Show
          hudson Hudson added a comment - Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #454 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/454/ ) HBASE-7590 Add a costless notifications mechanism from master to regionservers & clients - new files (Revision 1458199) HBASE-7590 Add a costless notifications mechanism from master to regionservers & clients (Revision 1458184) Result = FAILURE nkeywal : Files : /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ClusterStatusListener.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/ClusterStatusPublisher.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestClusterStatusPublisher.java nkeywal : Files : /hbase/trunk/hbase-client/pom.xml /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/Chore.java /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/ClusterStatus.java /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HConnection.java /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClientRPC.java /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/ProtobufRpcClientEngine.java /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/RpcClientEngine.java /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/RecoverableZooKeeper.java /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/DeadServer.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMasterCommandLine.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreFlusher.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/thrift/IncrementCoalescer.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseCluster.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/TestMultiVersions.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestHCM.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestFilterWithScanLimits.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestFilterWrapper.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestTimeRangeMapRed.java
          Hide
          hudson Hudson added a comment -

          Integrated in hbase-0.95-on-hadoop2 #33 (See https://builds.apache.org/job/hbase-0.95-on-hadoop2/33/)
          HBASE-7590 Add a costless notifications mechanism from master to regionservers & clients - new files (Revision 1458202)
          HBASE-7590 Add a costless notifications mechanism from master to regionservers & clients (Revision 1458188)

          Result = FAILURE
          nkeywal :
          Files :

          • /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ClusterStatusListener.java
          • /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/master/ClusterStatusPublisher.java
          • /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestClusterStatusPublisher.java

          nkeywal :
          Files :

          • /hbase/branches/0.95/hbase-client/pom.xml
          • /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/Chore.java
          • /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/ClusterStatus.java
          • /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
          • /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HConnection.java
          • /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
          • /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HTable.java
          • /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java
          • /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
          • /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClientRPC.java
          • /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/ProtobufRpcClientEngine.java
          • /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/RpcClientEngine.java
          • /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/RecoverableZooKeeper.java
          • /hbase/branches/0.95/hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java
          • /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
          • /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/master/DeadServer.java
          • /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
          • /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMasterCommandLine.java
          • /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
          • /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreFlusher.java
          • /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/thrift/IncrementCoalescer.java
          • /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseCluster.java
          • /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
          • /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/TestMultiVersions.java
          • /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java
          • /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java
          • /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java
          • /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestHCM.java
          • /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestFilterWithScanLimits.java
          • /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestFilterWrapper.java
          • /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestTimeRangeMapRed.java
          Show
          hudson Hudson added a comment - Integrated in hbase-0.95-on-hadoop2 #33 (See https://builds.apache.org/job/hbase-0.95-on-hadoop2/33/ ) HBASE-7590 Add a costless notifications mechanism from master to regionservers & clients - new files (Revision 1458202) HBASE-7590 Add a costless notifications mechanism from master to regionservers & clients (Revision 1458188) Result = FAILURE nkeywal : Files : /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ClusterStatusListener.java /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/master/ClusterStatusPublisher.java /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestClusterStatusPublisher.java nkeywal : Files : /hbase/branches/0.95/hbase-client/pom.xml /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/Chore.java /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/ClusterStatus.java /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HConnection.java /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HTable.java /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClientRPC.java /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/ProtobufRpcClientEngine.java /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/RpcClientEngine.java /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/RecoverableZooKeeper.java /hbase/branches/0.95/hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/master/DeadServer.java /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMasterCommandLine.java /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreFlusher.java /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/thrift/IncrementCoalescer.java /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseCluster.java /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/TestMultiVersions.java /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestHCM.java /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestFilterWithScanLimits.java /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestFilterWrapper.java /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestTimeRangeMapRed.java
          Hide
          hudson Hudson added a comment -

          Integrated in HBase-TRUNK #3972 (See https://builds.apache.org/job/HBase-TRUNK/3972/)
          HBASE-7590 Add a costless notifications mechanism from master to regionservers & clients - new files (Revision 1458199)
          HBASE-7590 Add a costless notifications mechanism from master to regionservers & clients (Revision 1458184)

          Result = SUCCESS
          nkeywal :
          Files :

          • /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ClusterStatusListener.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/ClusterStatusPublisher.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestClusterStatusPublisher.java

          nkeywal :
          Files :

          • /hbase/trunk/hbase-client/pom.xml
          • /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/Chore.java
          • /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/ClusterStatus.java
          • /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
          • /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HConnection.java
          • /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
          • /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java
          • /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
          • /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClientRPC.java
          • /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/ProtobufRpcClientEngine.java
          • /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/RpcClientEngine.java
          • /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/RecoverableZooKeeper.java
          • /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/DeadServer.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMasterCommandLine.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreFlusher.java
          • /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/thrift/IncrementCoalescer.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseCluster.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/TestMultiVersions.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestHCM.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestFilterWithScanLimits.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestFilterWrapper.java
          • /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestTimeRangeMapRed.java
          Show
          hudson Hudson added a comment - Integrated in HBase-TRUNK #3972 (See https://builds.apache.org/job/HBase-TRUNK/3972/ ) HBASE-7590 Add a costless notifications mechanism from master to regionservers & clients - new files (Revision 1458199) HBASE-7590 Add a costless notifications mechanism from master to regionservers & clients (Revision 1458184) Result = SUCCESS nkeywal : Files : /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ClusterStatusListener.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/ClusterStatusPublisher.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestClusterStatusPublisher.java nkeywal : Files : /hbase/trunk/hbase-client/pom.xml /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/Chore.java /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/ClusterStatus.java /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HConnection.java /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClientRPC.java /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/ProtobufRpcClientEngine.java /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/RpcClientEngine.java /hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/RecoverableZooKeeper.java /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/DeadServer.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMasterCommandLine.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreFlusher.java /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/thrift/IncrementCoalescer.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseCluster.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/TestMultiVersions.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestHCM.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestFilterWithScanLimits.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestFilterWrapper.java /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestTimeRangeMapRed.java
          Hide
          hudson Hudson added a comment -

          Integrated in hbase-0.95 #85 (See https://builds.apache.org/job/hbase-0.95/85/)
          HBASE-7590 Add a costless notifications mechanism from master to regionservers & clients - new files (Revision 1458202)
          HBASE-7590 Add a costless notifications mechanism from master to regionservers & clients (Revision 1458188)

          Result = FAILURE
          nkeywal :
          Files :

          • /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ClusterStatusListener.java
          • /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/master/ClusterStatusPublisher.java
          • /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestClusterStatusPublisher.java

          nkeywal :
          Files :

          • /hbase/branches/0.95/hbase-client/pom.xml
          • /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/Chore.java
          • /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/ClusterStatus.java
          • /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
          • /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HConnection.java
          • /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
          • /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HTable.java
          • /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java
          • /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java
          • /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClientRPC.java
          • /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/ProtobufRpcClientEngine.java
          • /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/RpcClientEngine.java
          • /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/RecoverableZooKeeper.java
          • /hbase/branches/0.95/hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java
          • /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
          • /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/master/DeadServer.java
          • /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
          • /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMasterCommandLine.java
          • /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
          • /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreFlusher.java
          • /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/thrift/IncrementCoalescer.java
          • /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseCluster.java
          • /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
          • /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/TestMultiVersions.java
          • /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java
          • /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java
          • /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java
          • /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestHCM.java
          • /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestFilterWithScanLimits.java
          • /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestFilterWrapper.java
          • /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestTimeRangeMapRed.java
          Show
          hudson Hudson added a comment - Integrated in hbase-0.95 #85 (See https://builds.apache.org/job/hbase-0.95/85/ ) HBASE-7590 Add a costless notifications mechanism from master to regionservers & clients - new files (Revision 1458202) HBASE-7590 Add a costless notifications mechanism from master to regionservers & clients (Revision 1458188) Result = FAILURE nkeywal : Files : /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ClusterStatusListener.java /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/master/ClusterStatusPublisher.java /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestClusterStatusPublisher.java nkeywal : Files : /hbase/branches/0.95/hbase-client/pom.xml /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/Chore.java /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/ClusterStatus.java /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HConnection.java /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HTable.java /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClient.java /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/HBaseClientRPC.java /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/ProtobufRpcClientEngine.java /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/RpcClientEngine.java /hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/RecoverableZooKeeper.java /hbase/branches/0.95/hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/master/DeadServer.java /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMasterCommandLine.java /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreFlusher.java /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/thrift/IncrementCoalescer.java /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseCluster.java /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/TestMultiVersions.java /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/catalog/TestMetaReaderEditorNoCluster.java /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/client/HConnectionTestingUtility.java /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestHCM.java /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestFilterWithScanLimits.java /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestFilterWrapper.java /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestTimeRangeMapRed.java
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12574232/7590.v13.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 32 new or modified tests.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 lineLengths. The patch does not introduce lines longer than 100

          -1 site. The patch appears to cause mvn site goal to fail.

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/4873//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4873//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4873//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4873//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4873//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4873//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4873//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4873//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4873//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4873//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12574232/7590.v13.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 32 new or modified tests. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. +1 javadoc . The javadoc tool did not generate any warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 -1 site . The patch appears to cause mvn site goal to fail. +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/4873//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4873//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4873//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4873//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4873//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4873//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4873//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4873//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4873//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4873//console This message is automatically generated.
          Hide
          nkeywal Nicolas Liochon added a comment -

          May be 13 is going to be my lucky number ?

          Show
          nkeywal Nicolas Liochon added a comment - May be 13 is going to be my lucky number ?
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12574136/7590.v12.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 32 new or modified tests.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 lineLengths. The patch does not introduce lines longer than 100

          -1 site. The patch appears to cause mvn site goal to fail.

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/4863//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4863//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4863//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4863//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4863//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4863//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4863//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4863//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4863//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4863//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12574136/7590.v12.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 32 new or modified tests. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. +1 javadoc . The javadoc tool did not generate any warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 -1 site . The patch appears to cause mvn site goal to fail. +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/4863//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4863//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4863//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4863//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4863//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4863//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4863//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4863//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4863//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4863//console This message is automatically generated.
          Hide
          nkeywal Nicolas Liochon added a comment -

          v12 with the comments on RB from Devaraj taken into account. Nearly there!

          Show
          nkeywal Nicolas Liochon added a comment - v12 with the comments on RB from Devaraj taken into account. Nearly there!
          Hide
          nkeywal Nicolas Liochon added a comment -

          I will fix the 100 lines stuff on commit. Any +1 on the new version?

          Show
          nkeywal Nicolas Liochon added a comment - I will fix the 100 lines stuff on commit. Any +1 on the new version?
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12573376/7590.v5.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 35 new or modified tests.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 lineLengths. The patch introduces lines longer than 100

          -1 site. The patch appears to cause mvn site goal to fail.

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/4780//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4780//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4780//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4780//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4780//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4780//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4780//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4780//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4780//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4780//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12573376/7590.v5.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 35 new or modified tests. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. +1 javadoc . The javadoc tool did not generate any warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 lineLengths . The patch introduces lines longer than 100 -1 site . The patch appears to cause mvn site goal to fail. +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/4780//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4780//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4780//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4780//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4780//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4780//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4780//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4780//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4780//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4780//console This message is automatically generated.
          Hide
          nkeywal Nicolas Liochon added a comment -

          Comments taken into account, and I added the IOException instead of only ZooKeeperConnection exception...
          I add it on RB as well.

          Show
          nkeywal Nicolas Liochon added a comment - Comments taken into account, and I added the IOException instead of only ZooKeeperConnection exception... I add it on RB as well.
          Hide
          devaraj Devaraj Das added a comment -

          Left some comments on RB. I have not gone through the whole patch yet.

          Show
          devaraj Devaraj Das added a comment - Left some comments on RB. I have not gone through the whole patch yet.
          Hide
          devaraj Devaraj Das added a comment -

          FYI the RB link is https://reviews.apache.org/r/9731/ .. Am taking a look at the patch.

          Show
          devaraj Devaraj Das added a comment - FYI the RB link is https://reviews.apache.org/r/9731/ .. Am taking a look at the patch.
          Hide
          nkeywal Nicolas Liochon added a comment -

          It's on RB, waiting for reviews before being committed .

          Show
          nkeywal Nicolas Liochon added a comment - It's on RB, waiting for reviews before being committed .
          Hide
          nkeywal Nicolas Liochon added a comment -

          At last. Ready for review (will post on review board as well).

          Show
          nkeywal Nicolas Liochon added a comment - At last. Ready for review (will post on review board as well).
          Hide
          hadoopqa Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12571396/7590.v3.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 14 new or modified tests.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 lineLengths. The patch does not introduce lines longer than 100

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/4602//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4602//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4602//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4602//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4602//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4602//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4602//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4602//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4602//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4602//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - +1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12571396/7590.v3.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 14 new or modified tests. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. +1 javadoc . The javadoc tool did not generate any warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/4602//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4602//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4602//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4602//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4602//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4602//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4602//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4602//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4602//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4602//console This message is automatically generated.
          Hide
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12571229/7590.v2.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 17 new or modified tests.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          -1 javac. The patch appears to cause mvn compile goal to fail.

          -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 lineLengths. The patch does not introduce lines longer than 100

          -1 core tests. The patch failed these unit tests:

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/4580//testReport/
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4580//console

          This message is automatically generated.

          Show
          hadoopqa Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12571229/7590.v2.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 17 new or modified tests. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. +1 javadoc . The javadoc tool did not generate any warning messages. -1 javac . The patch appears to cause mvn compile goal to fail. -1 findbugs . The patch appears to cause Findbugs (version 1.3.9) to fail. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 -1 core tests . The patch failed these unit tests: Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/4580//testReport/ Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4580//console This message is automatically generated.
          Hide
          nkeywal Nicolas Liochon added a comment -

          Nearly there. There is one todo left: HConnectionImplementation throws a ZooKeeperConnectionException, I wonder if I should make it throw a IOException instead.

          So now, if activated

          • the server sends a status message, at most one every 10 seconds. It contains the list of the newly dead server. When a server dies, it is sent 5 times, in case a client misses a message. If there are more than 10 servers to send, they are sent in multiple messages (one every 10 seconds), the newly dead first.
          • the clients listens to a status message. When they receive the notification that a server is dead, they clean their cache and close the connection to this server. When creating a new connection, they check that the server is not dead. For this, they use the server name and the start code instead of the hostname:port only.
          Show
          nkeywal Nicolas Liochon added a comment - Nearly there. There is one todo left: HConnectionImplementation throws a ZooKeeperConnectionException, I wonder if I should make it throw a IOException instead. So now, if activated the server sends a status message, at most one every 10 seconds. It contains the list of the newly dead server. When a server dies, it is sent 5 times, in case a client misses a message. If there are more than 10 servers to send, they are sent in multiple messages (one every 10 seconds), the newly dead first. the clients listens to a status message. When they receive the notification that a server is dead, they clean their cache and close the connection to this server. When creating a new connection, they check that the server is not dead. For this, they use the server name and the start code instead of the hostname:port only.
          Hide
          nkeywal Nicolas Liochon added a comment -

          current patch shows the work in progress. All tests passes, with or without the multicast activated. It works also on a real cluster.
          I've got some work to do still:

          • I've hijacked the current ClusterStatus protobuf, I'm going to create a specific one
          • I need to do some cleanup around ServerName & ServerCallable.
          • plus various.
          Show
          nkeywal Nicolas Liochon added a comment - current patch shows the work in progress. All tests passes, with or without the multicast activated. It works also on a real cluster. I've got some work to do still: I've hijacked the current ClusterStatus protobuf, I'm going to create a specific one I need to do some cleanup around ServerName & ServerCallable. plus various.
          Hide
          nkeywal Nicolas Liochon added a comment -

          It seems to be a separate hdfs recovery issue. But may be there are two issues, I need to dig more...

          Show
          nkeywal Nicolas Liochon added a comment - It seems to be a separate hdfs recovery issue. But may be there are two issues, I need to dig more...
          Hide
          nkeywal Nicolas Liochon added a comment -

          Yep, but at the beginning, we're just waiting 1 second. It's only at the 8th tries that we're getting serious and wait for 16 seconds: for a recovery, that usually takes a few 10th of second, the seven first tries are not very useful. As well, there is still the risk of a herd effect: you don't want 1000 servers to go immediately to the poor meta because they received the same notification. And lastly, sometimes the 10 tries are consumed without going to the right server, I need to understand why...

          Show
          nkeywal Nicolas Liochon added a comment - Yep, but at the beginning, we're just waiting 1 second. It's only at the 8th tries that we're getting serious and wait for 16 seconds: for a recovery, that usually takes a few 10th of second, the seven first tries are not very useful. As well, there is still the risk of a herd effect: you don't want 1000 servers to go immediately to the poor meta because they received the same notification. And lastly, sometimes the 10 tries are consumed without going to the right server, I need to understand why...
          Hide
          sershe Sergey Shelukhin added a comment -

          Oh, I see what you are saying. yeah, it's a different scenario that is geared towards when you get "RegionMovedException"
          However, this was supposed to be handled by slowly increasing retry timeout?
          I.e. why does it immediately come to meta after failure, there should be a delay, right? Additional delay seems like an overkill, e.g. it will hurt the (hopefully common) fast recovery scenario.

          Show
          sershe Sergey Shelukhin added a comment - Oh, I see what you are saying. yeah, it's a different scenario that is geared towards when you get "RegionMovedException" However, this was supposed to be handled by slowly increasing retry timeout? I.e. why does it immediately come to meta after failure, there should be a delay, right? Additional delay seems like an overkill, e.g. it will hurt the (hopefully common) fast recovery scenario.
          Hide
          nkeywal Nicolas Liochon added a comment -

          Sergey Shelukhin if I'm not wrong, we're not updating meta until the region is reassigned? I.e. when a box dies, .meta. contains stale info until the split is finished & the assignment process finished. So we would get the "RegionOpeningException" only at the very end, when we connect to the new server, but we would still go to meta before?

          Show
          nkeywal Nicolas Liochon added a comment - Sergey Shelukhin if I'm not wrong, we're not updating meta until the region is reassigned? I.e. when a box dies, .meta. contains stale info until the split is finished & the assignment process finished. So we would get the "RegionOpeningException" only at the very end, when we connect to the new server, but we would still go to meta before?
          Hide
          sershe Sergey Shelukhin added a comment -

          HBASE-7649 has a fix that I just added yesterday (not tested yet) where while the region is opening the client won't go to meta for this region for a grace period.

          Show
          sershe Sergey Shelukhin added a comment - HBASE-7649 has a fix that I just added yesterday (not tested yet) where while the region is opening the client won't go to meta for this region for a grace period.
          Hide
          nkeywal Nicolas Liochon added a comment -

          I've got something working.
          There is an interesting side effect: the client is informed immediately that the regionserver died, so immediately goes to .meta. As the recovery is not done, .meta. contains the same (dead) location, so the client fails again and comes back immediately to .meta. => We're hammering .meta. now. The easy fix is to add a ~10s sleep on the client. A possibly better fix from a mttr point of view would be to have the master sending messages to say that a server recovery is finished. I will go for the former first.

          Show
          nkeywal Nicolas Liochon added a comment - I've got something working. There is an interesting side effect: the client is informed immediately that the regionserver died, so immediately goes to .meta. As the recovery is not done, .meta. contains the same (dead) location, so the client fails again and comes back immediately to .meta. => We're hammering .meta. now. The easy fix is to add a ~10s sleep on the client. A possibly better fix from a mttr point of view would be to have the master sending messages to say that a server recovery is finished. I will go for the former first.
          Hide
          nkeywal Nicolas Liochon added a comment -

          Actually, one of the issue is that in the client code, we don't really manage the server name. We use the hostname & the port, but we don't use directly the start code... There is sequence number, but I need to find out if it matches the start code.

          Despite this, I have something working for the server side, and the client receives the status. The point is to put properly the checks in the client (and this is unrelated to the communication protocol

          Show
          nkeywal Nicolas Liochon added a comment - Actually, one of the issue is that in the client code, we don't really manage the server name. We use the hostname & the port, but we don't use directly the start code... There is sequence number, but I need to find out if it matches the start code. Despite this, I have something working for the server side, and the client receives the status. The point is to put properly the checks in the client (and this is unrelated to the communication protocol
          Hide
          nkeywal Nicolas Liochon added a comment -

          How about clients watching the region server's ephemeral nodes.

          We would also need to manage the new regionservers and the client disconnect. There could be extra cases with short lived clients that could hammer ZK. Using a separate znode allows to share a lot of code between a multicast mode and a ZK mode. Listening directly all znodes from the client would mean having just a ZK mode imho (but it could be fine).

          Show
          nkeywal Nicolas Liochon added a comment - How about clients watching the region server's ephemeral nodes. We would also need to manage the new regionservers and the client disconnect. There could be extra cases with short lived clients that could hammer ZK. Using a separate znode allows to share a lot of code between a multicast mode and a ZK mode. Listening directly all znodes from the client would mean having just a ZK mode imho (but it could be fine).
          Hide
          enis Enis Soztutar added a comment -

          How about clients watching the region server's ephemeral nodes. The only problem is that there might be herd effect when a server goes down.

          Show
          enis Enis Soztutar added a comment - How about clients watching the region server's ephemeral nodes. The only problem is that there might be herd effect when a server goes down.
          Hide
          apurtell Andrew Purtell added a comment -

          The trouble with multicast is in some networks it presents a problem. As an option, it's fine. As the only option, it is not.

          I don't claim to deeply understand the issues, but I know multicast traffic can trigger broadcast floods, I think the risk is high when internal routing is reconverging and there is a logical<->physical topology incongruence. The point is, some NOCs may prefer to not use multicast.

          I think case in point, on EC2 multicast doesn't work at all: https://forums.aws.amazon.com/message.jspa?messageID=52742, http://aws.amazon.com/vpc/faqs/#R4

          Show
          apurtell Andrew Purtell added a comment - The trouble with multicast is in some networks it presents a problem. As an option, it's fine. As the only option, it is not. I don't claim to deeply understand the issues, but I know multicast traffic can trigger broadcast floods, I think the risk is high when internal routing is reconverging and there is a logical<->physical topology incongruence. The point is, some NOCs may prefer to not use multicast. I think case in point, on EC2 multicast doesn't work at all: https://forums.aws.amazon.com/message.jspa?messageID=52742 , http://aws.amazon.com/vpc/faqs/#R4
          Hide
          nkeywal Nicolas Liochon added a comment -

          After more thinking about it:

          So if I recapitulate the proposal:

          • the master sends the dead servers list, with no more than 1 message per 10 seconds
          • clients (including regionservers) uses this information to stop using a server identified as dead
          • so when they have a huge timeout, because they are doing a slow operation server side, they check that the server is still there.

          Advantages

          • makes slow operation easier to manage
          • less false positive client side (the master is the reference)
          • clients reacts immediately when the server is actually dead.
          • it's optional.

          For the implementation, there are 3 options

          Option Multicast:
          No subscribe: the client listen on the right ip.
          Easy backup for the master: the active master uses the right ip.

          Option Do it yourself in UDP but without multicast
          It seems that this would require:

          • client starts to listen on a port, them contact the master.
          • so client register themselves on the master: it means they connect to the master when they start
          • master needs to watch if the client is still there (or client need to resubscribe every 10 minutes or so)
          • if the master fails, the client needs to resubscribe (or the state must be put in ZK).

          Option ZK:
          Multiple implementation choices. One of them is to put the protobuf message in a znode that can be watched by the clients if they wish.
          If so:

          • no direct dependency from client to master.
          • will benefit from local session when available
          • every client has a permanent tcp connection to ZK.
          • ZK receives/resent one message every 10s or so
          • as usual, on each event, the client needs to watch again.

          I don't think that the second option (do it yourself) is very good.
          It's not uncommon to have two options for such features, for the people who don't want multicast. My personal choice would be Multicast, then ZK.
          Writing a znode every 10 seconds is not perfect (I guess ZK devs would say ZK is not built for this), but should be manageable, even if the message size is around 1Kb. ZK would do for small clusters without new config required, multicast for large.

          I could start with ZK, but I don't really like the idea of writing something that would not scale. So in terms of timeframe my preference would be to go for the multicast first...

          Thoughts?

          Show
          nkeywal Nicolas Liochon added a comment - After more thinking about it: So if I recapitulate the proposal: the master sends the dead servers list, with no more than 1 message per 10 seconds clients (including regionservers) uses this information to stop using a server identified as dead so when they have a huge timeout, because they are doing a slow operation server side, they check that the server is still there. Advantages makes slow operation easier to manage less false positive client side (the master is the reference) clients reacts immediately when the server is actually dead. it's optional. For the implementation, there are 3 options Option Multicast: No subscribe: the client listen on the right ip. Easy backup for the master: the active master uses the right ip. Option Do it yourself in UDP but without multicast It seems that this would require: client starts to listen on a port, them contact the master. so client register themselves on the master: it means they connect to the master when they start master needs to watch if the client is still there (or client need to resubscribe every 10 minutes or so) if the master fails, the client needs to resubscribe (or the state must be put in ZK). Option ZK: Multiple implementation choices. One of them is to put the protobuf message in a znode that can be watched by the clients if they wish. If so: no direct dependency from client to master. will benefit from local session when available every client has a permanent tcp connection to ZK. ZK receives/resent one message every 10s or so as usual, on each event, the client needs to watch again. I don't think that the second option (do it yourself) is very good. It's not uncommon to have two options for such features, for the people who don't want multicast. My personal choice would be Multicast, then ZK. Writing a znode every 10 seconds is not perfect (I guess ZK devs would say ZK is not built for this), but should be manageable, even if the message size is around 1Kb. ZK would do for small clusters without new config required, multicast for large. I could start with ZK, but I don't really like the idea of writing something that would not scale. So in terms of timeframe my preference would be to go for the multicast first... Thoughts?
          Hide
          nkeywal Nicolas Liochon added a comment - - edited

          Sorry, I missed your comment.

          • znodes+watches was initially my preferred target, but it's not ideal today, we need at least ZOOKEEPER-1147.
            There are as well differences between the contract we need and the ZK features: we accept to miss events and we don't need a synchronisation between servers (a single master sending its view of the world is enough), we don't want to monitor the client. Additionally, we don't need to write, but that's what ZOOKEEPER-1147 is about. My feeling is that waiting for ZK can take a long time...
          • I read the doc, I'm unclear of the cost. They say it's replaced by a broadcast for the switches (not the routers) when they don't support a specific protocol. While I guess that it may be challenging for systems with thousands of messages per second or hundred of thousands of client application, here it seems quite simple: a maximum number of client in the order of a few thousands + 1 message every 10 seconds or so.

          I don't mind beeing proven wrong here. I've used such mechanisms (with actually much more messages) in the past, but all the datacenters guys were quite used to this type of architecture so they had no questions on it and I didn't dig inside the operational impact.

          Anyway, there is no doubt it should be an optional mechanism.

          Show
          nkeywal Nicolas Liochon added a comment - - edited Sorry, I missed your comment. znodes+watches was initially my preferred target, but it's not ideal today, we need at least ZOOKEEPER-1147 . There are as well differences between the contract we need and the ZK features: we accept to miss events and we don't need a synchronisation between servers (a single master sending its view of the world is enough), we don't want to monitor the client. Additionally, we don't need to write, but that's what ZOOKEEPER-1147 is about. My feeling is that waiting for ZK can take a long time... I read the doc, I'm unclear of the cost. They say it's replaced by a broadcast for the switches (not the routers) when they don't support a specific protocol. While I guess that it may be challenging for systems with thousands of messages per second or hundred of thousands of client application, here it seems quite simple: a maximum number of client in the order of a few thousands + 1 message every 10 seconds or so. I don't mind beeing proven wrong here. I've used such mechanisms (with actually much more messages) in the past, but all the datacenters guys were quite used to this type of architecture so they had no questions on it and I didn't dig inside the operational impact. Anyway, there is no doubt it should be an optional mechanism.
          Hide
          apurtell Andrew Purtell added a comment -

          Could ZK znodes+watches serve as a communication channel between the master and regionservers?

          I'm not sure multicast should be considered "costless". See http://tools.ietf.org/html/draft-mcbride-armd-mcast-overview-00

          Show
          apurtell Andrew Purtell added a comment - Could ZK znodes+watches serve as a communication channel between the master and regionservers? I'm not sure multicast should be considered "costless". See http://tools.ietf.org/html/draft-mcbride-armd-mcast-overview-00

            People

            • Assignee:
              nkeywal Nicolas Liochon
              Reporter:
              nkeywal Nicolas Liochon
            • Votes:
              0 Vote for this issue
              Watchers:
              19 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development