Description
elserj has already summarized it well.
1. hbase:meta was on RS8
2. RS8 crashed, SCP was queued for it, meta first
3. meta was marked OFFLINE
4. meta marked as OPENING on RS3
5. Can't actually send the openRegion RPC to RS3 due to the krb ticket issue
6. We attempt the openRegion/assignment 10 times, failing each time
7. We start rolling back the procedure:
2018-10-08 06:51:24,440 WARN [PEWorker-9] procedure2.ProcedureExecutor: Usually this should not happen, we will release the lock before if the procedure is finished, even if the holdLock is true, arrive here means we have some holes where we do not release the lock. And the releaseLock below may fail since the procedure may have already been deleted from the procedure store. 2018-10-08 06:51:24,543 INFO [PEWorker-9] procedure.MasterProcedureScheduler: pid=48, ppid=47, state=FAILED:REGION_TRANSITION_QUEUE, exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max attempts exceeded; AssignProcedure table=hbase:meta, region=1588230740 checking lock on 1588230740
2018-10-08 06:51:30,957 ERROR [PEWorker-9] procedure2.ProcedureExecutor: CODE-BUG: Uncaught runtime exception for pid=47, state=FAILED:SERVER_CRASH_ASSIGN_META, locked=true, exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max attempts exceeded; ServerCrashProcedure server=<ip-address>,16020,1538974612843, splitWal=true, meta=true java.lang.UnsupportedOperationException: unhandled state=SERVER_CRASH_GET_REGIONS at org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:254) at org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.rollbackState(ServerCrashProcedure.java:58) at org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203) at org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:960) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1577) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1539) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1418) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1981)
{ DEBUG [PEWorker-2] client.RpcRetryingCallerImpl: Call exception, tries=7, retries=7, started=8168 ms ago, cancelled=false, msg=Meta region is in state OPENING, details=row 'backup:system' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=<hostname>, seqNum=-1, exception=java.io.IOException: Meta region is in state OPENING at org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$null$1(ZKAsyncRegistry.java:154) at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) at org.apache.hadoop.hbase.client.ZKAsyncRegistry.lambda$getAndConvert$0(ZKAsyncRegistry.java:77) at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) at org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$ZKTask$1.exec(ReadOnlyZKClient.java:165) at org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient.run(ReadOnlyZKClient.java:323) at java.lang.Thread.run(Thread.java:748)
Attachments
Attachments
Issue Links
- links to