Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Not A Problem
-
None
-
None
-
None
-
None
Description
(This description got redone after I figured out what was going on. Previously it was just a litany of me banging around trying to learn procedure-based WAL splitting and hbase.wal.split.to.hfile; no one needs to read that; hence the refactor).
HBASE-24574 procedure-based distributed WAL splitting is enabled and split-to-hflie too. A force crash requires recovery with ServerCrashProcedure splitting old WALs on restart. The recovery fails because we get stuck. The Master can't assign meta because it is being recovered. The recovery can't make progress because it is asking for a table descriptor for meta – needed by the hbase.wal.split.to.hfile feature – and the master is not yet initialized. After the default timeout, Master shuts down because it can't initialize.
2020-06-18 19:53:54,175 ERROR [main] master.HMasterCommandLine: Master exiting java.lang.RuntimeException: Master not initialized after 200000ms at org.apache.hadoop.hbase.util.JVMClusterUtil.waitForEvent(JVMClusterUtil.java:232) at org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:200) at org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:430) at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:232) at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:140) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:149) at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:3059)
The abort of Master interrupts other ongoing actions so later in the log we'll see the WAL split show as interrupted
2020-06-17 21:20:37,472 ERROR [RS_LOG_REPLAY_OPS-regionserver/localhost:16020-0] handler.RSProcedureHandler: Error when call RSProcedureCallable: java.io.IOException: Failed WAL split, status=RESIGNED, wal=file:/Users/stack/checkouts/hbase.apache.git/tmp/hbase/WALs/localhost,16020,1592440848604-splitting/localhost%2C16020%2C1592440848604.meta.1592440852959.meta at org.apache.hadoop.hbase.regionserver.SplitWALCallable.splitWal(SplitWALCallable.java:106) at org.apache.hadoop.hbase.regionserver.SplitWALCallable.call(SplitWALCallable.java:86) at org.apache.hadoop.hbase.regionserver.SplitWALCallable.call(SplitWALCallable.java:49) at org.apache.hadoop.hbase.regionserver.handler.RSProcedureHandler.process(RSProcedureHandler.java:49) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
This issue becomes how to make hbase.wal.split.to.hfile work in standalone mode.
Attachments
Issue Links
- relates to
-
HBASE-23739 BoundedRecoveredHFilesOutputSink should read the table descriptor directly
- Resolved
-
HBASE-24574 Procedure V2 - Distributed WAL Splitting => LOGGING
- Resolved
-
HBASE-19216 Implement a general framework to execute remote procedure on RS
- Closed
-
HBASE-24766 Document Remote Procedure Execution
- Closed