Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-5137

MasterFileSystem.splitLog() should abort even if waitOnSafeMode() throws IOException

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.90.4
    • 0.90.6, 0.92.0
    • None
    • None

    Description

      I am not sure if this bug was already raised in JIRA.
      In our test cluster we had a scenario where the RS had gone down and ServerShutDownHandler started with splitLog.
      But as the HDFS was down the check waitOnSafeMode throws IOException.

      try {
              // If FS is in safe mode, just wait till out of it.
              FSUtils.waitOnSafeMode(conf,
                conf.getInt(HConstants.THREAD_WAKE_FREQUENCY, 1000));  
              splitter.splitLog();
            } catch (OrphanHLogAfterSplitException e) {
      

      We catch the exception

      } catch (IOException e) {
            checkFileSystem();
            LOG.error("Failed splitting " + logDir.toString(), e);
          }
      

      So the HLog split itself did not happen. We encontered like 4 regions that was recently splitted in the crashed RS was lost.

      Can we abort the Master in such scenarios? Pls suggest.

      Attachments

        1. 5137-trunk.txt
          0.8 kB
          Ted Yu
        2. HBASE-5137.patch
          3 kB
          ramkrishna.s.vasudevan
        3. HBASE-5137.patch
          3 kB
          ramkrishna.s.vasudevan

        Activity

          People

            ram_krish ramkrishna.s.vasudevan
            ram_krish ramkrishna.s.vasudevan
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: