Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-7590

Add a costless notifications mechanism from master to regionservers & clients

    XMLWordPrintableJSON

Details

    • Reviewed
    • Hide
      This allows to setup a multicast connection between the master and the hbase clients. With the feature on, when a regionserver is marked as dead by the master, the master sends as well a multicast message that will make the hbase client to disconnect immediately from the dead server instead of waiting for a socket timeout. Specifically, this allows to set hbase.rpc.timeout to larger values (like 5 minutes) without impacting the MTTR: without this, even if the dead regionserver data is now available on another server, the client stays on the dead one, waiting for an answer that will never come. It's a multicast message, hence cheap, scalable, but unreliable. For this reason, the master sends the information 5 times, to allow the hbase client to miss a message. This feature is NOT activated by default. To activate it, add to your hbase-site.xml:

        <property>
          <name>hbase.status.published</name>
          <value>true</value>
        </property>

      You can as well configure the ip address and port used with the following setting:
      <property>
      <name>hbase.status.multicast.address.ip</name>
      <value>226.1.1.3</value>
      </property>

      <property>
      <name>hbase.status.multicast.address.port</name>
      <value>6100</value>
      </property>
      Show
      This allows to setup a multicast connection between the master and the hbase clients. With the feature on, when a regionserver is marked as dead by the master, the master sends as well a multicast message that will make the hbase client to disconnect immediately from the dead server instead of waiting for a socket timeout. Specifically, this allows to set hbase.rpc.timeout to larger values (like 5 minutes) without impacting the MTTR: without this, even if the dead regionserver data is now available on another server, the client stays on the dead one, waiting for an answer that will never come. It's a multicast message, hence cheap, scalable, but unreliable. For this reason, the master sends the information 5 times, to allow the hbase client to miss a message. This feature is NOT activated by default. To activate it, add to your hbase-site.xml:   <property>     <name>hbase.status.published</name>     <value>true</value>   </property> You can as well configure the ip address and port used with the following setting: <property> <name>hbase.status.multicast.address.ip</name> <value>226.1.1.3</value> </property> <property> <name>hbase.status.multicast.address.port</name> <value>6100</value> </property>
    • 0.96notable

    Description

      t would be very useful to add a mechanism to distribute some information to the clients and regionservers. Especially It would be useful to know globally (regionservers + clients apps) that some regionservers are dead. This would allow:

      • to lower the load on the system, without clients using staled information and going on dead machines
      • to make the recovery faster from a client point of view. It's common to use large timeouts on the client side, so the client may need a lot of time before declaring a region server dead and trying another one. If the client receives the information separatly about a region server states, it can take the right decision, and continue/stop to wait accordingly.

      We can also send more information, for example instructions like 'slow down' to instruct the client to increase the retries delay and so on.

      Technically, the master could send this information. To lower the load on the system, we should:

      • have a multicast communication (i.e. the master does not have to connect to all servers by tcp), with once packet every 10 seconds or so.
      • receivers should not depend on this: if the information is available great. If not, it should not break anything.
      • it should be optional.

      So at the end we would have a thread in the master sending a protobuf message about the dead servers on a multicast socket. If the socket is not configured, it does not do anything. On the client side, when we receive an information that a node is dead, we refresh the cache about it.

      Attachments

        1. 7590.inprogress.patch
          68 kB
          Nicolas Liochon
        2. 7590.v1.patch
          66 kB
          Nicolas Liochon
        3. 7590.v12.patch
          83 kB
          Nicolas Liochon
        4. 7590.v12.patch
          83 kB
          Nicolas Liochon
        5. 7590.v13.patch
          83 kB
          Nicolas Liochon
        6. 7590.v1-rebased.patch
          66 kB
          Nicolas Liochon
        7. 7590.v2.patch
          69 kB
          Nicolas Liochon
        8. 7590.v3.patch
          68 kB
          Nicolas Liochon
        9. 7590.v5.patch
          85 kB
          Nicolas Liochon
        10. 7590.v5.patch
          85 kB
          Nicolas Liochon

        Issue Links

          Activity

            People

              nkeywal Nicolas Liochon
              nkeywal Nicolas Liochon
              Votes:
              0 Vote for this issue
              Watchers:
              19 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: