HBase
  1. HBase
  2. HBASE-6012

Handling RegionOpeningState for bulk assign

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.95.2
    • Fix Version/s: 0.95.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Tags:
      0.96notable

      Description

      Since HBASE-5914, we using bulk assign for SSH

      But in the bulk assign case if we get an ALREADY_OPENED case there is no one to clear the znode created by bulk assign.

      Another thing, when RS opening a list of regions, if one region is already in transition, it will throw RegionAlreadyInTransitionException and stop opening other regions.

      1. HBASE-6012v8.patch
        16 kB
        chunhui shen
      2. HBASE-6012v7.patch
        16 kB
        chunhui shen
      3. HBASE-6012v6.patch
        15 kB
        chunhui shen
      4. HBASE-6012v5.patch
        15 kB
        chunhui shen
      5. HBASE-6012v4.patch
        15 kB
        chunhui shen
      6. HBASE-6012v3.patch
        10 kB
        chunhui shen
      7. HBASE-6012v2.patch
        2 kB
        chunhui shen
      8. HBASE-6012.patch
        4 kB
        chunhui shen

        Issue Links

          Activity

          chunhui shen created issue -
          chunhui shen made changes -
          Field Original Value New Value
          Attachment HBASE-6012.patch [ 12527558 ]
          chunhui shen made changes -
          Description As the javadoc of method and the log message
          {code}
          /**
             * Set region as OFFLINED up in zookeeper asynchronously.
             */
          boolean asyncSetOfflineInZooKeeper(
          ...
          master.abort("Unexpected ZK exception creating/setting node OFFLINE", e);
          ...
          }
          {code}
          I think AssignmentManager#asyncSetOfflineInZooKeeper should also force node offline, just like AssignmentManager#setOfflineInZooKeeper do. Otherwise, it may cause bulk assign failed which called this method.
          As the javadoc of method and the log message
          {code}
          /**
             * Set region as OFFLINED up in zookeeper asynchronously.
             */
          boolean asyncSetOfflineInZooKeeper(
          ...
          master.abort("Unexpected ZK exception creating/setting node OFFLINE", e);
          ...
          }
          {code}
          I think AssignmentManager#asyncSetOfflineInZooKeeper should also force node offline, just like AssignmentManager#setOfflineInZooKeeper do. Otherwise, it may cause bulk assign failed which called this method.


          Error log on the master caused by the issue

          2012-05-12 01:40:09,437 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=writetest,1YTQDPGLXBTICHOPQ6IL,1336590857771.674da422fc7cb9a7d42c74499ace1d93. state=PENDING_CLOSE, ts=1336757876856
          2012-05-12 01:40:09,437 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x23736bf74780082 Async create of unassigned node for 674da422fc7cb9a7d42c74499ace1d93 with OFFLINE state
          2012-05-12 01:40:09,446 WARN org.apache.hadoop.hbase.master.AssignmentManager$CreateUnassignedAsyncCallback: rc != 0 for /hbase-func1/unassigned/674da422fc7cb9a7d42c74499ace1d93 -- retryable connectionloss -- FIX see http://wiki.apache.org/hadoop/ZooKeeper/FAQ#A2
          2012-05-12 01:40:09,447 FATAL org.apache.hadoop.hbase.master.HMaster: Connectionloss writing unassigned at /hbase-func1/unassigned/674da422fc7cb9a7d42c74499ace1d93, rc=-110
          chunhui shen made changes -
          Link This issue relates to HBASE-5914 [ HBASE-5914 ]
          ramkrishna.s.vasudevan made changes -
          Fix Version/s 0.96.0 [ 12320040 ]
          Affects Version/s 0.96.0 [ 12320040 ]
          chunhui shen made changes -
          Attachment HBASE-6012v2.patch [ 12530718 ]
          chunhui shen made changes -
          Summary AssignmentManager#asyncSetOfflineInZooKeeper wouldn't force node offline Handling RegionOpeningState for bulk assign since SSH using
          Description As the javadoc of method and the log message
          {code}
          /**
             * Set region as OFFLINED up in zookeeper asynchronously.
             */
          boolean asyncSetOfflineInZooKeeper(
          ...
          master.abort("Unexpected ZK exception creating/setting node OFFLINE", e);
          ...
          }
          {code}
          I think AssignmentManager#asyncSetOfflineInZooKeeper should also force node offline, just like AssignmentManager#setOfflineInZooKeeper do. Otherwise, it may cause bulk assign failed which called this method.


          Error log on the master caused by the issue

          2012-05-12 01:40:09,437 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=writetest,1YTQDPGLXBTICHOPQ6IL,1336590857771.674da422fc7cb9a7d42c74499ace1d93. state=PENDING_CLOSE, ts=1336757876856
          2012-05-12 01:40:09,437 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x23736bf74780082 Async create of unassigned node for 674da422fc7cb9a7d42c74499ace1d93 with OFFLINE state
          2012-05-12 01:40:09,446 WARN org.apache.hadoop.hbase.master.AssignmentManager$CreateUnassignedAsyncCallback: rc != 0 for /hbase-func1/unassigned/674da422fc7cb9a7d42c74499ace1d93 -- retryable connectionloss -- FIX see http://wiki.apache.org/hadoop/ZooKeeper/FAQ#A2
          2012-05-12 01:40:09,447 FATAL org.apache.hadoop.hbase.master.HMaster: Connectionloss writing unassigned at /hbase-func1/unassigned/674da422fc7cb9a7d42c74499ace1d93, rc=-110
          Since HBASE-5914, we using bulk assign for SSH

          But in the bulk assign case if we get an ALREADY_OPENED case there is no one to clear the znode created by bulk assign.


          Another thing, when RS opening a list of regions, if one region is already in transition, it will throw RegionAlreadyInTransitionException and stop opening other regions.
          chunhui shen made changes -
          Attachment HBASE-6012v3.patch [ 12531084 ]
          chunhui shen made changes -
          Attachment HBASE-6012v4.patch [ 12531213 ]
          chunhui shen made changes -
          Attachment HBASE-6012v5.patch [ 12531218 ]
          ramkrishna.s.vasudevan made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          chunhui shen made changes -
          Attachment HBASE-6012v6.patch [ 12531650 ]
          Ted Yu made changes -
          Summary Handling RegionOpeningState for bulk assign since SSH using Handling RegionOpeningState for bulk assign
          chunhui shen made changes -
          Attachment HBASE-6012v7.patch [ 12531766 ]
          chunhui shen made changes -
          Attachment HBASE-6012v8.patch [ 12531768 ]
          Ted Yu made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Hadoop Flags Reviewed [ 10343 ]
          Resolution Fixed [ 1 ]
          Sergey Shelukhin made changes -
          Link This issue relates to HBASE-7039 [ HBASE-7039 ]
          stack made changes -
          Fix Version/s 0.95.0 [ 12324094 ]
          Fix Version/s 0.96.0 [ 12320040 ]
          stack made changes -
          Fix Version/s 0.98.0 [ 12323143 ]
          stack made changes -
          Fix Version/s 0.98.0 [ 12323143 ]
          stack made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          stack made changes -
          Tags 0.96notable

            People

            • Assignee:
              chunhui shen
              Reporter:
              chunhui shen
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development