Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-21344

hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.0.3
    • Component/s: proc-v2
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Josh Elser has already summarized it well.

      1. hbase:meta was on RS8
      2. RS8 crashed, SCP was queued for it, meta first
      3. meta was marked OFFLINE
      4. meta marked as OPENING on RS3
      5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue
      6. We attempt the openRegion/assignment 10 times, failing each time
      7. We start rolling back the procedure:

      2018-10-08 06:51:24,440 WARN  [PEWorker-9] procedure2.ProcedureExecutor: Usually this should not happen, we will release the lock before if the procedure is finished, even if the holdLock is true, arrive here means we have some holes where we do not release the lock. And the releaseLock below may fail since the procedure may have already been deleted from the procedure store.
      2018-10-08 06:51:24,543 INFO  [PEWorker-9] procedure.MasterProcedureScheduler: pid=48, ppid=47, state=FAILED:REGION_TRANSITION_QUEUE, exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 checking lock on 1588230740
      
      2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: CODE-BUG: Uncaught runtime exception for pid=47, state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max attempts exceeded; ServerCrashProcedure server=<ip-address>,16020,1538974612843, splitWal=true, meta=true
      java.lang.UnsupportedOperationException: unhandled state=SERVER_CRASH_GET_REGIONS
      	at org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254)
      	at org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58)
      	at org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
      	at org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960)
      	at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577)
      	at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539)
      	at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418)
      	at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75)
      	at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981)
      
      { DEBUG [PEWorker-2] client.RpcRetryingCallerImpl: Call exception, tries=7, retries=7, started=8168 ms ago, cancelled=false, msg=Meta region is in state OPENING, details=row 'backup:system' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=<hostname>, seqNum=-1, exception=java.io.IOException: Meta region is in state OPENING
              at org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$null$1(ZKAsyncRegistry.java:154)
              at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
              at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
              at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
              at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
              at org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$getAndConvert$0(ZKAsyncRegistry.java:77)
              at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
              at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
              at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
              at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
              at org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$ZKTask$1.exec(ReadOnlyZKClient.java:165)
              at org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient.run(ReadOnlyZKClient.java:323)
              at java.lang.Thread.run(Thread.java:748)
      
      

        Attachments

        1. HBASE-21344.branch-2.0.003-addendum.patch
          3 kB
          Ankit Singhal
        2. HBASE-21344.branch-2.0.001.patch
          7 kB
          Michael Stack
        3. HBASE-21344.branch-2.0.003.patch
          6 kB
          Ankit Singhal
        4. HBASE-21344-branch-2.0_v3.patch
          6 kB
          Ankit Singhal
        5. HBASE-21344-branch-2.0_v2.patch
          7 kB
          Ankit Singhal
        6. HBASE-21344-branch-2.0.patch
          24 kB
          Ankit Singhal

          Issue Links

            Activity

              People

              • Assignee:
                ankit@apache.org Ankit Singhal
                Reporter:
                ankit@apache.org Ankit Singhal
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: