Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-6088

Region splitting not happened for long time due to ZK exception while creating RS_ZK_SPLITTING node

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.94.0
    • 0.94.1, 0.95.0
    • None
    • None
    • Reviewed

    Description

      Region splitting not happened for long time due to ZK exception while creating RS_ZK_SPLITTING node

      2012-05-24 01:45:41,363 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 26668ms for sessionid 0x1377a75f41d0012, closing socket connection and attempting reconnect
      2012-05-24 01:45:41,464 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper exception: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/unassigned/bd1079bf948c672e493432020dc0e144
      
      2012-05-24 01:45:43,300 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: cleanupCurrentWriter  waiting for transactions to get synced  total 189377 synced till here 189365
      2012-05-24 01:45:48,474 INFO org.apache.hadoop.hbase.regionserver.SplitRequest: Running rollback/cleanup of failed split of ufdr,011365398471659,1337823505339.bd1079bf948c672e493432020dc0e144.; Failed setting SPLITTING znode on ufdr,011365398471659,1337823505339.bd1079bf948c672e493432020dc0e144.
      java.io.IOException: Failed setting SPLITTING znode on ufdr,011365398471659,1337823505339.bd1079bf948c672e493432020dc0e144.
      	at org.apache.hadoop.hbase.regionserver.SplitTransaction.createDaughters(SplitTransaction.java:242)
      	at org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:450)
      	at org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:67)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
      	at java.lang.Thread.run(Thread.java:662)
      Caused by: org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion for /hbase/unassigned/bd1079bf948c672e493432020dc0e144
      	at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
      	at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
      	at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1246)
      	at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:321)
      	at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:659)
      	at org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNode(ZKAssign.java:811)
      	at org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNode(ZKAssign.java:747)
      	at org.apache.hadoop.hbase.regionserver.SplitTransaction.transitionNodeSplitting(SplitTransaction.java:919)
      	at org.apache.hadoop.hbase.regionserver.SplitTransaction.createNodeSplitting(SplitTransaction.java:869)
      	at org.apache.hadoop.hbase.regionserver.SplitTransaction.createDaughters(SplitTransaction.java:239)
      	... 5 more
      2012-05-24 01:45:48,476 INFO org.apache.hadoop.hbase.regionserver.SplitRequest: Successful rollback of failed split of ufdr,011365398471659,1337823505339.bd1079bf948c672e493432020dc0e144.
      
      2012-05-24 01:47:28,141 ERROR org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Node /hbase/unassigned/bd1079bf948c672e493432020dc0e144 already exists and this is not a retry
      2012-05-24 01:47:28,142 INFO org.apache.hadoop.hbase.regionserver.SplitRequest: Running rollback/cleanup of failed split of ufdr,011365398471659,1337823505339.bd1079bf948c672e493432020dc0e144.; Failed create of ephemeral /hbase/unassigned/bd1079bf948c672e493432020dc0e144
      java.io.IOException: Failed create of ephemeral /hbase/unassigned/bd1079bf948c672e493432020dc0e144
      	at org.apache.hadoop.hbase.regionserver.SplitTransaction.createNodeSplitting(SplitTransaction.java:865)
      	at org.apache.hadoop.hbase.regionserver.SplitTransaction.createDaughters(SplitTransaction.java:239)
      	at org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:450)
      	at org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:67)
      

      Due to the above exception, region splitting was failing contineously more than 5hrs

      Attachments

        1. HBASE-6088_trunk.patch
          8 kB
          rajeshbabu
        2. HBASE-6088_trunk_2.patch
          8 kB
          rajeshbabu
        3. HBASE-6088_94.patch
          8 kB
          rajeshbabu
        4. HBASE-6088_trunk_3.patch
          8 kB
          rajeshbabu
        5. HBASE-6088_94_2.patch
          8 kB
          rajeshbabu
        6. HBASE-6088_94_3.patch
          8 kB
          rajeshbabu
        7. HBASE-6088_92.patch
          8 kB
          rajeshbabu
        8. HBASE-6088_trunk_4.patch
          8 kB
          rajeshbabu
        9. addendum_6088_94.patch
          0.9 kB
          ramkrishna.s.vasudevan

        Issue Links

          Activity

            People

              rajesh23 rajeshbabu
              gopinathan.av Gopinathan A
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: