HBase
  1. HBase
  2. HBASE-4246

Cluster with too many regions cannot withstand some master failover scenarios

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.90.4
    • Fix Version/s: None
    • Component/s: master, Zookeeper
    • Labels:
      None

      Description

      We ran into the following sequence of events:

      • master startup failed after only ROOT had been assigned (for another reason)
      • restarted the master without restarting other servers. Since there was at least one region assigned, it went through the failover code path
      • master scanned META and inserted every region into /hbase/unassigned in ZK.
      • then, it called "listChildren" on the /hbase/unassigned znode, and crashed with "Packet len6080218 is out of range!" since the IPC response was larger than the default maximum.

        Issue Links

          Activity

          Hide
          Todd Lipcon added a comment -

          We were able to work around this issue by bumping jute.maxbuffer up to 100MB on the cluster in question.

          Another solution would be to shard the /hbase/unassigned dir by a prefix of the region ID. eg region 1234567890abcdef would go in /hbase/unassigned/1234/1234567890abcdef - so, we have to do a traversal to get the full list, but any particular RPC response is limited in size.

          Show
          Todd Lipcon added a comment - We were able to work around this issue by bumping jute.maxbuffer up to 100MB on the cluster in question. Another solution would be to shard the /hbase/unassigned dir by a prefix of the region ID. eg region 1234567890abcdef would go in /hbase/unassigned/1234/1234567890abcdef - so, we have to do a traversal to get the full list, but any particular RPC response is limited in size.
          Hide
          Lars Hofhansl added a comment -

          Moving out of 0.94. Pull back if you feel otherwise.

          Show
          Lars Hofhansl added a comment - Moving out of 0.94. Pull back if you feel otherwise.
          Hide
          gaojinchao added a comment -

          Hi, It also happpened in our cluster when we restarted whole cluster(it has 129723 regions).

          2012-06-19 19:29:00,961 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x137ed2eb936fb85 Creating (or updating) unassigned node for 80400ccd4a1f3438cc23774ca8a88d17 with OFFLINE state
          2012-06-19 19:29:00,965 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE, server=172-16-6-2:20000, region=80400ccd4a1f3438cc23774ca8a88d17
          2012-06-19 19:29:00,966 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x137ed2eb936fb85 Creating (or updating) unassigned node for 7f1a56641906ae0a6cc6919bd927df76 with OFFLINE state
          2012-06-19 19:29:00,969 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE, server=172-16-6-2:20000, region=7f1a56641906ae0a6cc6919bd927df76
          2012-06-19 19:29:01,070 WARN org.apache.zookeeper.ClientCnxn: Session 0x137ed2eb936fb85 for server 172-16-6-1/172.16.6.1:2181, unexpected error, closing socket connection and attempting reconnect
          2012-06-19 19:29:01,070 WARN org.apache.zookeeper.ClientCnxn: Session 0x137ed2eb936fb85 for server 172-16-6-1/172.16.6.1:2181, unexpected error, closing socket connection and attempting reconnect
          java.io.IOException: Packet len4670048 is out of range!
          at org.apache.zookeeper.ClientCnxn$SendThread.readLength(ClientCnxn.java:721)
          at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:880)
          at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1145)
          2012-06-19 19:29:01,174 WARN org.apache.hadoop.hbase.zookeeper.ZKUtil: master:20000-0x137ed2eb936fb85 Unable to list children of znode /hbase/unassigned

          Show
          gaojinchao added a comment - Hi, It also happpened in our cluster when we restarted whole cluster(it has 129723 regions). 2012-06-19 19:29:00,961 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x137ed2eb936fb85 Creating (or updating) unassigned node for 80400ccd4a1f3438cc23774ca8a88d17 with OFFLINE state 2012-06-19 19:29:00,965 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE, server=172-16-6-2:20000, region=80400ccd4a1f3438cc23774ca8a88d17 2012-06-19 19:29:00,966 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:20000-0x137ed2eb936fb85 Creating (or updating) unassigned node for 7f1a56641906ae0a6cc6919bd927df76 with OFFLINE state 2012-06-19 19:29:00,969 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE, server=172-16-6-2:20000, region=7f1a56641906ae0a6cc6919bd927df76 2012-06-19 19:29:01,070 WARN org.apache.zookeeper.ClientCnxn: Session 0x137ed2eb936fb85 for server 172-16-6-1/172.16.6.1:2181, unexpected error, closing socket connection and attempting reconnect 2012-06-19 19:29:01,070 WARN org.apache.zookeeper.ClientCnxn: Session 0x137ed2eb936fb85 for server 172-16-6-1/172.16.6.1:2181, unexpected error, closing socket connection and attempting reconnect java.io.IOException: Packet len4670048 is out of range! at org.apache.zookeeper.ClientCnxn$SendThread.readLength(ClientCnxn.java:721) at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:880) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1145) 2012-06-19 19:29:01,174 WARN org.apache.hadoop.hbase.zookeeper.ZKUtil: master:20000-0x137ed2eb936fb85 Unable to list children of znode /hbase/unassigned
          Hide
          ramkrishna.s.vasudevan added a comment -

          @Gao
          You got this in 0.90? or someother version?

          Show
          ramkrishna.s.vasudevan added a comment - @Gao You got this in 0.90? or someother version?
          Hide
          Laxman added a comment -

          This may come in latest version also as we didn't change the znode hierarchy of the unassigned regions. As mentioned in linked issue, there is a cap on packet length. We can't read/write huge data in a single packet.

          IMO, to resolve this we need to do either of the following.

          • In HBASE: We can use hierarchical structure.
            HDFS datanode follows similar strategy. It keeps block files in different sub directories to avoid FS lookup latency.
          • In ZooKeeper: Increase the limit. What is reasonable?
            We have tried this out in some other project but it has the side effects. When we tried read/write huge data from ZooKeeper, clients occassionally gets disconnected. This is sequential request processing. Please check out the related discussions @

          http://mail-archives.apache.org/mod_mbox/zookeeper-user/201007.mbox/%3CC85A33EC.3A46A%25mahadev@yahoo-inc.com%3E

          Following JIRA and discussion also applicable in current scenario.
          http://mail-archives.apache.org/mod_mbox/zookeeper-user/201104.mbox/%3CFFA3BDB6-1C83-42B9-B2A0-7675134626C5@me.com%3E
          https://issues.apache.org/jira/browse/ZOOKEEPER-1049

          Show
          Laxman added a comment - This may come in latest version also as we didn't change the znode hierarchy of the unassigned regions. As mentioned in linked issue, there is a cap on packet length. We can't read/write huge data in a single packet. IMO, to resolve this we need to do either of the following . In HBASE: We can use hierarchical structure. HDFS datanode follows similar strategy. It keeps block files in different sub directories to avoid FS lookup latency. In ZooKeeper: Increase the limit. What is reasonable? We have tried this out in some other project but it has the side effects. When we tried read/write huge data from ZooKeeper, clients occassionally gets disconnected. This is sequential request processing. Please check out the related discussions @ http://mail-archives.apache.org/mod_mbox/zookeeper-user/201007.mbox/%3CC85A33EC.3A46A%25mahadev@yahoo-inc.com%3E Following JIRA and discussion also applicable in current scenario. http://mail-archives.apache.org/mod_mbox/zookeeper-user/201104.mbox/%3CFFA3BDB6-1C83-42B9-B2A0-7675134626C5@me.com%3E https://issues.apache.org/jira/browse/ZOOKEEPER-1049
          Hide
          gaojinchao added a comment -

          The version is 0.90.X, I have asked the customer up jute.maxbuffer to 64M.

          Show
          gaojinchao added a comment - The version is 0.90.X, I have asked the customer up jute.maxbuffer to 64M.
          Hide
          Devaraj Das added a comment -

          Am downgrading the priority since there is a work-around (increase the buffer size).

          Show
          Devaraj Das added a comment - Am downgrading the priority since there is a work-around (increase the buffer size).
          Hide
          stack added a comment -

          I added a note to troubleshooting doc in the master startup section.

          Show
          stack added a comment - I added a note to troubleshooting doc in the master startup section.
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK #3759 (See https://builds.apache.org/job/HBase-TRUNK/3759/)
          Add to troubleshooting a note on zk buffer size issue when lots of regions – hbase-4246 (Revision 1434087)

          Result = FAILURE

          Show
          Hudson added a comment - Integrated in HBase-TRUNK #3759 (See https://builds.apache.org/job/HBase-TRUNK/3759/ ) Add to troubleshooting a note on zk buffer size issue when lots of regions – hbase-4246 (Revision 1434087) Result = FAILURE
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #351 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/351/)
          Add to troubleshooting a note on zk buffer size issue when lots of regions – hbase-4246 (Revision 1434087)

          Result = FAILURE

          Show
          Hudson added a comment - Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #351 (See https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/351/ ) Add to troubleshooting a note on zk buffer size issue when lots of regions – hbase-4246 (Revision 1434087) Result = FAILURE
          Hide
          stack added a comment -

          Not being worked on for 0.95. Moving it out. Leaving it around in case others run into the issue so they can find the workaround. We should also consider doing the suggested hierarcherying (word?) in zk).

          Show
          stack added a comment - Not being worked on for 0.95. Moving it out. Leaving it around in case others run into the issue so they can find the workaround. We should also consider doing the suggested hierarcherying (word?) in zk).

            People

            • Assignee:
              Unassigned
              Reporter:
              Todd Lipcon
            • Votes:
              1 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

              • Created:
                Updated:

                Development