Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-19121 HBCK for AMv2 (A.K.A HBCK2)
  3. HBASE-21156

[hbck2] Queue an assign of hbase:meta and bulk assign/unassign

    Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 2.1.0
    • Fix Version/s: 3.0.0, 2.2.0, 2.1.1
    • Component/s: hbck2
    • Labels:
      None
    • Release Note:
      Hide
      Adds 'raw' assigns/unassigns to the Hbck Service. Takes a list of encoded region names and bulk assigns/unassigns. Skirts Master 'state' check and does not invoke Coprocessors. For repair only.

      Here is what HBCK2 usage looks like now:

      {code}
      $ java -cp hbase-hbck2-1.0.0-SNAPSHOT.jar org.apache.hbase.HBCK2
      usage: HBCK2 <OPTIONS> COMMAND [<ARGS>]

      Options:
       -d,--debug run with debug output
       -h,--help output this help message
          --hbase.zookeeper.peerport peerport of target hbase ensemble
          --hbase.zookeeper.quorum ensemble of target hbase
          --zookeeper.znode.parent parent znode of target hbase

      Commands:
       setTableState <TABLENAME> <STATE>
         Possible table states: ENABLED, DISABLED, DISABLING, ENABLING
         To read current table state, in the hbase shell run:
           hbase> get 'hbase:meta', '<TABLENAME>', 'table:state'
         A value of \x08\x00 == ENABLED, \x08\x01 == DISABLED, etc.
         An example making table name 'user' ENABLED:
           $ HBCK2 setTableState users ENABLED
         Returns whatever the previous table state was.

       assign <ENCODED_REGIONNAME> ...
         A 'raw' assign that can be used even during Master initialization.
         Skirts Coprocessors. Pass one or more encoded RegionNames:
         e.g. 1588230740 is hard-coded encoding for hbase:meta region and
         de00010733901a05f5a2a3a382e27dd4 is an example of what a random
         user-space encoded Region name looks like. For example:
           $ HBCK2 assign 1588230740 de00010733901a05f5a2a3a382e27dd4
         Returns the pid of the created AssignProcedure or -1 if none.

       unassign <ENCODED_REGIONNAME> ...
         A 'raw' unassign that can be used even during Master initialization.
         Skirts Coprocessors. Pass one or more encoded RegionNames:
         Skirts Coprocessors. Pass one or more encoded RegionNames:
         de00010733901a05f5a2a3a382e27dd4 is an example of what a random
         user-space encoded Region name looks like. For example:
           $ HBCK2 unassign 1588230740 de00010733901a05f5a2a3a382e27dd4
         Returns the pid of the created UnassignProcedure or -1 if none.
      {code}
      Show
      Adds 'raw' assigns/unassigns to the Hbck Service. Takes a list of encoded region names and bulk assigns/unassigns. Skirts Master 'state' check and does not invoke Coprocessors. For repair only. Here is what HBCK2 usage looks like now: {code} $ java -cp hbase-hbck2-1.0.0-SNAPSHOT.jar org.apache.hbase.HBCK2 usage: HBCK2 <OPTIONS> COMMAND [<ARGS>] Options:  -d,--debug run with debug output  -h,--help output this help message     --hbase.zookeeper.peerport peerport of target hbase ensemble     --hbase.zookeeper.quorum ensemble of target hbase     --zookeeper.znode.parent parent znode of target hbase Commands:  setTableState <TABLENAME> <STATE>    Possible table states: ENABLED, DISABLED, DISABLING, ENABLING    To read current table state, in the hbase shell run:      hbase> get 'hbase:meta', '<TABLENAME>', 'table:state'    A value of \x08\x00 == ENABLED, \x08\x01 == DISABLED, etc.    An example making table name 'user' ENABLED:      $ HBCK2 setTableState users ENABLED    Returns whatever the previous table state was.  assign <ENCODED_REGIONNAME> ...    A 'raw' assign that can be used even during Master initialization.    Skirts Coprocessors. Pass one or more encoded RegionNames:    e.g. 1588230740 is hard-coded encoding for hbase:meta region and    de00010733901a05f5a2a3a382e27dd4 is an example of what a random    user-space encoded Region name looks like. For example:      $ HBCK2 assign 1588230740 de00010733901a05f5a2a3a382e27dd4    Returns the pid of the created AssignProcedure or -1 if none.  unassign <ENCODED_REGIONNAME> ...    A 'raw' unassign that can be used even during Master initialization.    Skirts Coprocessors. Pass one or more encoded RegionNames:    Skirts Coprocessors. Pass one or more encoded RegionNames:    de00010733901a05f5a2a3a382e27dd4 is an example of what a random    user-space encoded Region name looks like. For example:      $ HBCK2 unassign 1588230740 de00010733901a05f5a2a3a382e27dd4    Returns the pid of the created UnassignProcedure or -1 if none. {code}

      Description

      We need this to effect repair when damage.

      If procedure WALs AND a server WAL dir are lost or cleaned or we crashed during partial split (unlikely scenarios but nonetheless possible), a Master can be stuck unable to become active because there is no assign procedure for hbase:meta in the system.

      The reasonable argument over in HBASE-21035 has it that attempts at auto-repair under these extremes could cause other issues so at least until we learn more, we for now punt to the operator for fix-up.

      To reproduce the catastrophe, see notes in HBASE-21035 (and Allan Yang's test).

      UPDATE: HBASE-21191 adds a Master assuming an "holding-pattern" if on startup it does not have an assign for meta (possible if we lose all Master WAL Procs.). Holding pattern is needed because we were exiting after one minute of RPC'ing to old meta location. To inject an assign, the Admin#assign won't work because it gets rejected because the "Master is Initializing". So we need to be able to assign hbase:meta even if "Master is initializing". Also, while in here, add being able to bulk assign because assigning a Region-at-a-time from the shell only works if the offflined region count is in the low 10s; fails when thousands offline.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                stack stack
                Reporter:
                stack stack
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: