Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-21344

hbase:meta location in ZooKeeper set to OPENING by the procedure which eventually failed but precludes Master from assigning it forever

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.0.3
    • proc-v2
    • None
    • Reviewed

    Description

      elserj has already summarized it well.

      1. hbase:meta was on RS8
      2. RS8 crashed, SCP was queued for it, meta first
      3. meta was marked OFFLINE
      4. meta marked as OPENING on RS3
      5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue
      6. We attempt the openRegion/assignment 10 times, failing each time
      7. We start rolling back the procedure:

      2018-10-08 06:51:24,440 WARN  [PEWorker-9] procedure2.ProcedureExecutor: Usually this should not happen, we will release the lock before if the procedure is finished, even if the holdLock is true, arrive here means we have some holes where we do not release the lock. And the releaseLock below may fail since the procedure may have already been deleted from the procedure store.
      2018-10-08 06:51:24,543 INFO  [PEWorker-9] procedure.MasterProcedureScheduler: pid=48, ppid=47, state=FAILED:REGION_TRANSITION_QUEUE, exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 checking lock on 1588230740
      
      2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: CODE-BUG: Uncaught runtime exception for pid=47, state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max attempts exceeded; ServerCrashProcedure server=<ip-address>,16020,1538974612843, splitWal=true, meta=true
      java.lang.UnsupportedOperationException: unhandled state=SERVER_CRASH_GET_REGIONS
      	at org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254)
      	at org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58)
      	at org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
      	at org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960)
      	at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577)
      	at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539)
      	at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418)
      	at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75)
      	at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981)
      
      { DEBUG [PEWorker-2] client.RpcRetryingCallerImpl: Call exception, tries=7, retries=7, started=8168 ms ago, cancelled=false, msg=Meta region is in state OPENING, details=row 'backup:system' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=<hostname>, seqNum=-1, exception=java.io.IOException: Meta region is in state OPENING
              at org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$null$1(ZKAsyncRegistry.java:154)
              at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
              at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
              at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
              at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
              at org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$getAndConvert$0(ZKAsyncRegistry.java:77)
              at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
              at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
              at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
              at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962)
              at org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$ZKTask$1.exec(ReadOnlyZKClient.java:165)
              at org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient.run(ReadOnlyZKClient.java:323)
              at java.lang.Thread.run(Thread.java:748)
      
      

      Attachments

        1. HBASE-21344-branch-2.0.patch
          24 kB
          Ankit Singhal
        2. HBASE-21344-branch-2.0_v3.patch
          6 kB
          Ankit Singhal
        3. HBASE-21344-branch-2.0_v2.patch
          7 kB
          Ankit Singhal
        4. HBASE-21344.branch-2.0.003-addendum.patch
          3 kB
          Ankit Singhal
        5. HBASE-21344.branch-2.0.003.patch
          6 kB
          Ankit Singhal
        6. HBASE-21344.branch-2.0.001.patch
          7 kB
          Michael Stack

        Issue Links

          Activity

            People

              ankit@apache.org Ankit Singhal
              ankit@apache.org Ankit Singhal
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: