Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.90.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      While testing a cluster, we hit upon the following assert during region assigment. We were killing the master during a long run of splits. We think what happened is that the HMaster was killed while splitting, woke up & split again. If this happens, we will have 2 files: 1 partially written and 1 complete one. Since encountering partial log splits upon Master failure is considered normal behavior, we should continue at the RS level if we encounter an EOFException & not an filesystem-level exception, even with skip.errors == false.

      2010-08-20 16:59:07,718 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: Error opening MailBox_dsanduleac,57db45276ece7ce03ef7e8d9969eb189:99900000000008@facebook.com,1280960828959.7c542d24d4496e273b739231b01885e6.
      java.io.EOFException
      at java.io.DataInputStream.readInt(DataInputStream.java:375)
      at org.apache.hadoop.io.SequenceFile$Reader.readRecordLength(SequenceFile.java:1902)
      at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1932)
      at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1837)
      at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1883)
      at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:121)
      at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:113)
      at org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:1981)
      at org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:1956)
      at org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:1915)
      at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:344)
      at org.apache.hadoop.hbase.regionserver.HRegionServer.instantiateRegion(HRegionServer.java:1490)
      at org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionServer.java:1437)
      at org.apache.hadoop.hbase.regionserver.HRegionServer$Worker.run(HRegionServer.java:1345)
      at java.lang.Thread.run(Thread.java:619)
      2010-08-20 16:59:07,719 ERROR org.apache.hadoop.hbase.regionserver.RSZookeeperUpdater: Aborting open of region 7c542d24d4496e273b739231b01885e6

      1. HBASE-2933.patch
        6 kB
        Nicolas Spiegelberg

        Issue Links

          Activity

          Hide
          tlipcon Todd Lipcon added a comment -

          Hrm, isn't this basically the same as that other ticket about log splitting? ie if the master gets killed during splitting, the next master should completely resplit and the half-done log splits should be removed, right?

          Show
          tlipcon Todd Lipcon added a comment - Hrm, isn't this basically the same as that other ticket about log splitting? ie if the master gets killed during splitting, the next master should completely resplit and the half-done log splits should be removed, right?
          Hide
          nspiegelberg Nicolas Spiegelberg added a comment -

          Linking related issues that I found. @todd: I don't recall seeing one about deleting the old logs at Master startup. I remember addressing it in HBASE-2337, so may it didn't get properly ported to 0.90. I agree that we should delete old split logs on master startup + split. Should we also be making sure that any EOF exceptions don't force the region offline, especially if they are still archived?

          Show
          nspiegelberg Nicolas Spiegelberg added a comment - Linking related issues that I found. @todd: I don't recall seeing one about deleting the old logs at Master startup. I remember addressing it in HBASE-2337 , so may it didn't get properly ported to 0.90. I agree that we should delete old split logs on master startup + split. Should we also be making sure that any EOF exceptions don't force the region offline, especially if they are still archived?
          Hide
          tlipcon Todd Lipcon added a comment -

          I can't remember the particular JIRA either, but it seems to me that the regionserver shouldn't even get to the point of doing recovery if the logs haven't been completely recovered. ie the phases should be:

          1) Original RS is writing logs and dies
          2) Master A notices failure and starts splitting logs. It gets halfway through writing region_1/oldlog
          3) Master A dies
          4) Master B takes over, and knows from ZK that RS's recovery is incomplete.
          5) Master B should remove the half-written log split done by Master A, and try again from the start.

          ie no region server should attempt to open region 1 until the logs have been properly split. Thus, the RS should never see an EOFException on log recovery, since it indicates that log splitting is incomplete.

          Show
          tlipcon Todd Lipcon added a comment - I can't remember the particular JIRA either, but it seems to me that the regionserver shouldn't even get to the point of doing recovery if the logs haven't been completely recovered. ie the phases should be: 1) Original RS is writing logs and dies 2) Master A notices failure and starts splitting logs. It gets halfway through writing region_1/oldlog 3) Master A dies 4) Master B takes over, and knows from ZK that RS's recovery is incomplete. 5) Master B should remove the half-written log split done by Master A, and try again from the start. ie no region server should attempt to open region 1 until the logs have been properly split. Thus, the RS should never see an EOFException on log recovery, since it indicates that log splitting is incomplete.
          Hide
          streamy Jonathan Gray added a comment -

          I believe this happened when a cluster was being forcibly shutdown in the middle of log splitting on the master. The new master came up acting like a cluster startup not failover. I'm not even sure if this makes a difference or not in the current master but it might on the new master. I guess in both cases we should be doing the same log splitting checks?

          Show
          streamy Jonathan Gray added a comment - I believe this happened when a cluster was being forcibly shutdown in the middle of log splitting on the master. The new master came up acting like a cluster startup not failover. I'm not even sure if this makes a difference or not in the current master but it might on the new master. I guess in both cases we should be doing the same log splitting checks?
          Hide
          tlipcon Todd Lipcon added a comment -

          Yep - even with good working master failover, you might lose power on the cluster, in which case all the masters would die. We'd have a "clean startup" but to not lose data we have to split all the logs (woo!)

          Show
          tlipcon Todd Lipcon added a comment - Yep - even with good working master failover, you might lose power on the cluster, in which case all the masters would die. We'd have a "clean startup" but to not lose data we have to split all the logs (woo!)
          Hide
          stack stack added a comment -

          Bringing into 0.90. Need to write a test to ensure that new master removes old partially split logs if old master died mid-split; also make it so we don't die if RS gets EOF – though this should never happen as Todd says if proper split – but also we should keep going if we get something like IOE ""File is corrupt!" (See below)

              private synchronized int readRecordLength() throws IOException {
                if (in.getPos() >= end) {
                  return -1;
                }
                int length = in.readInt();
                if (version > 1 && sync != null &&
                    length == SYNC_ESCAPE) {              // process a sync entry
                  in.readFully(syncCheck);                // read syncCheck
                  if (!Arrays.equals(sync, syncCheck))    // check it
                    throw new IOException("File is corrupt!");
                  syncSeen = true;
                  if (in.getPos() >= end) {
                    return -1;
                  }
                  length = in.readInt();                  // re-read length
                } else {
                  syncSeen = false;
                }
          
                return length;
              }
          
          Show
          stack stack added a comment - Bringing into 0.90. Need to write a test to ensure that new master removes old partially split logs if old master died mid-split; also make it so we don't die if RS gets EOF – though this should never happen as Todd says if proper split – but also we should keep going if we get something like IOE ""File is corrupt!" (See below) private synchronized int readRecordLength() throws IOException { if (in.getPos() >= end) { return -1; } int length = in.readInt(); if (version > 1 && sync != null && length == SYNC_ESCAPE) { // process a sync entry in.readFully(syncCheck); // read syncCheck if (!Arrays.equals(sync, syncCheck)) // check it throw new IOException( "File is corrupt!" ); syncSeen = true ; if (in.getPos() >= end) { return -1; } length = in.readInt(); // re-read length } else { syncSeen = false ; } return length; }
          Hide
          stack stack added a comment -

          Looking in logs I see this kinda thing:

          2010-09-07 18:10:27,965 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: Found existing old edits file. It could be the result of a previous failed split attempt. Deleting hdfs://sv4borg9:9000/hbase/api_access_token_stats_day/1845102219/recovered.edits/0000000068762427569, length=264167
          

          .. so we're some cleanup of old split attempts.

          Show
          stack stack added a comment - Looking in logs I see this kinda thing: 2010-09-07 18:10:27,965 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: Found existing old edits file. It could be the result of a previous failed split attempt. Deleting hdfs: //sv4borg9:9000/hbase/api_access_token_stats_day/1845102219/recovered.edits/0000000068762427569, length=264167 .. so we're some cleanup of old split attempts.
          Hide
          nspiegelberg Nicolas Spiegelberg added a comment -

          Handles EOFE and the IOE that was mentioned by stack. The SequenceFile.Reader has a few more IOEs, so this isn't 100% fail-proof. The general problem we seem to have is that we need to differentiate between a Network IOE and a File Format IOE. A File Format IOE is idempotent, where a Network IOE may not be.

          Network = we need to fail and let another server try to take over
          FileFormat = our file was written or parsed incorrectly. retrying won't fix anything. We need to just open what we have and store the original file away for later analysis.

          Show
          nspiegelberg Nicolas Spiegelberg added a comment - Handles EOFE and the IOE that was mentioned by stack. The SequenceFile.Reader has a few more IOEs, so this isn't 100% fail-proof. The general problem we seem to have is that we need to differentiate between a Network IOE and a File Format IOE. A File Format IOE is idempotent, where a Network IOE may not be. Network = we need to fail and let another server try to take over FileFormat = our file was written or parsed incorrectly. retrying won't fix anything. We need to just open what we have and store the original file away for later analysis.
          Hide
          stack stack added a comment -

          Marking as patch available

          Show
          stack stack added a comment - Marking as patch available
          Hide
          nspiegelberg Nicolas Spiegelberg added a comment -

          Note that I created HADOOP-6986 to fix all edge-cases of ParseExceptions in the SequenceFile.Reader. I will create a new jira to address the edge-cases once this fix is in 0.20-append.

          Show
          nspiegelberg Nicolas Spiegelberg added a comment - Note that I created HADOOP-6986 to fix all edge-cases of ParseExceptions in the SequenceFile.Reader. I will create a new jira to address the edge-cases once this fix is in 0.20-append.
          Hide
          nspiegelberg Nicolas Spiegelberg added a comment -

          @Stack : This jira should be good to commit, right? We can address the edge cases with HBASE-3081

          Show
          nspiegelberg Nicolas Spiegelberg added a comment - @Stack : This jira should be good to commit, right? We can address the edge cases with HBASE-3081
          Hide
          stack stack added a comment -

          @Nicolas Thanks for clarification. Committed. Thanks for the patch.

          Show
          stack stack added a comment - @Nicolas Thanks for clarification. Committed. Thanks for the patch.
          Hide
          lars_francke Lars Francke added a comment -

          This issue was closed as part of a bulk closing operation on 2015-11-20. All issues that have been resolved and where all fixVersions have been released have been closed (following discussions on the mailing list).

          Show
          lars_francke Lars Francke added a comment - This issue was closed as part of a bulk closing operation on 2015-11-20. All issues that have been resolved and where all fixVersions have been released have been closed (following discussions on the mailing list).

            People

            • Assignee:
              nspiegelberg Nicolas Spiegelberg
              Reporter:
              nspiegelberg Nicolas Spiegelberg
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development