Description
I am not sure if this bug was already raised in JIRA.
In our test cluster we had a scenario where the RS had gone down and ServerShutDownHandler started with splitLog.
But as the HDFS was down the check waitOnSafeMode throws IOException.
try { // If FS is in safe mode, just wait till out of it. FSUtils.waitOnSafeMode(conf, conf.getInt(HConstants.THREAD_WAKE_FREQUENCY, 1000)); splitter.splitLog(); } catch (OrphanHLogAfterSplitException e) {
We catch the exception
} catch (IOException e) { checkFileSystem(); LOG.error("Failed splitting " + logDir.toString(), e); }
So the HLog split itself did not happen. We encontered like 4 regions that was recently splitted in the crashed RS was lost.
Can we abort the Master in such scenarios? Pls suggest.