Hadoop Common
  1. Hadoop Common
  2. HADOOP-6017

NameNode and SecondaryNameNode fail to restart because of abnormal filenames.

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.18.3
    • Fix Version/s: 0.18.4, 0.19.2, 0.20.1, 0.21.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      SecondaryNameNode (and NameNode) fail to load the edits log. I will include stack trace in next comment.

      This is traced to the fact that LeaseManager uses String.relaceFirst() to replace front of a sting with another string. Unfortunately replaceFirst() uses regex, though the first argument is quoted by the code, the second argument is not. (the second arg is not really treated as regex but still gets processed for back references (as in 'sed s/first/second/g')

      As Nicholas suggested, it is just simpler to use substring() to replace part of the string.

      1. 6017_20090611.patch
        2 kB
        Tsz Wo Nicholas Sze
      2. 6017_20090611b.patch
        2 kB
        Tsz Wo Nicholas Sze
      3. HADOOP-6017-branch-18.patch
        3 kB
        Raghu Angadi
      4. HADOOP-6017-branch-20.patch
        2 kB
        Raghu Angadi

        Issue Links

          Activity

          Hide
          Robert Chansler added a comment -

          Editorial pass over all release notes prior to publication of 0.21. Bug.

          Show
          Robert Chansler added a comment - Editorial pass over all release notes prior to publication of 0.21. Bug.
          Hide
          Hudson added a comment -
          Show
          Hudson added a comment - Integrated in Hadoop-trunk #869 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/869/ )
          Hide
          Raghu Angadi added a comment -

          This fixes actual bug. The edits file is not corrupted. It is just that NameNode didn't handle filenames properly for certain edit log entries. With this patch, NN and secondary NN can handle the same edit log properly.

          Show
          Raghu Angadi added a comment - This fixes actual bug. The edits file is not corrupted. It is just that NameNode didn't handle filenames properly for certain edit log entries. With this patch, NN and secondary NN can handle the same edit log properly.
          Hide
          Allen Wittenauer added a comment -

          Looking at the patch, this appears to only fix input/output validation. How do we deal with our currently corrupted edits file? This patch needs to fix that as well!

          Show
          Allen Wittenauer added a comment - Looking at the patch, this appears to only fix input/output validation. How do we deal with our currently corrupted edits file? This patch needs to fix that as well!
          Hide
          Raghu Angadi added a comment -

          Patch for branch 0.20 is attached. The trunk patch does not apply because of path difference for hdfs tests.

          This patch applies to 0.19 as well.

          Show
          Raghu Angadi added a comment - Patch for branch 0.20 is attached. The trunk patch does not apply because of path difference for hdfs tests. This patch applies to 0.19 as well.
          Hide
          Raghu Angadi added a comment -

          I just committed this. Thanks Nicholas.

          Show
          Raghu Angadi added a comment - I just committed this. Thanks Nicholas.
          Hide
          Raghu Angadi added a comment -

          I am planning to commit these. Hudson is currently running Jun 11th patches.

          Show
          Raghu Angadi added a comment - I am planning to commit these. Hudson is currently running Jun 11th patches.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          I filed HADOOP-6029 for TestReduceFetch.

          Show
          Tsz Wo Nicholas Sze added a comment - I filed HADOOP-6029 for TestReduceFetch.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          I ran unit tests on trunk with 6017_20090611b.patch. The only failed test was TestReduceFetch, which seemed unrelated. Then, I ran TestReduceFetch on a clean trunk. It also failed in both my linux and windows machines.

          Show
          Tsz Wo Nicholas Sze added a comment - I ran unit tests on trunk with 6017_20090611b.patch. The only failed test was TestReduceFetch, which seemed unrelated. Then, I ran TestReduceFetch on a clean trunk. It also failed in both my linux and windows machines.
          Hide
          Tsz Wo Nicholas Sze added a comment -
               [exec] +1 overall.  
               [exec] 
               [exec]     +1 @author.  The patch does not contain any @author tags.
               [exec] 
               [exec]     +1 tests included.  The patch appears to include 3 new or modified tests.
               [exec] 
               [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
               [exec] 
               [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
               [exec] 
               [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
               [exec] 
               [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
               [exec] 
               [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
          
          Show
          Tsz Wo Nicholas Sze added a comment - [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.
          Hide
          Raghu Angadi added a comment -

          +1. The patch looks good.

          I am attaching the patch 0.18 branch.

          There are more changes required for TestRenameWhileOpen.java since the rename test that the new test is part of was disabled in 0.18. This patch enables parts of the test.

          Show
          Raghu Angadi added a comment - +1. The patch looks good. I am attaching the patch 0.18 branch. There are more changes required for TestRenameWhileOpen.java since the rename test that the new test is part of was disabled in 0.18. This patch enables parts of the test.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          6017_20090611b.patch: changed the unit test a little bit.

          Show
          Tsz Wo Nicholas Sze added a comment - 6017_20090611b.patch: changed the unit test a little bit.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          6017_20090611.patch: we should not use String.replaceFirst(..) at all.

          Show
          Tsz Wo Nicholas Sze added a comment - 6017_20090611.patch: we should not use String.replaceFirst(..) at all.
          Hide
          Raghu Angadi added a comment -

          Stacktrace of such a failure:

          2009-06-11 07:14:30,798 ERROR org.apache.hadoop.dfs.NameNode.Secondary: 
          Throwable Exception in doCheckpoint:
          2009-06-11 07:14:30,798 ERROR org.apache.hadoop.dfs.NameNode.Secondary: 
          java.lang.IllegalArgumentException: Illegal group reference
                  at java.util.regex.Matcher.appendReplacement(Matcher.java:713)
                  at java.util.regex.Matcher.replaceFirst(Matcher.java:861)
                  at java.lang.String.replaceFirst(String.java:2147)
                  at 
          org.apache.hadoop.dfs.LeaseManager.changeLease(LeaseManager.java:288)
                  at 
          org.apache.hadoop.dfs.FSNamesystem.changeLease(FSNamesystem.java:4441)
                  at org.apache.hadoop.dfs.FSEditLog.loadFSEdits(FSEditLog.java:563)
                  at org.apache.hadoop.dfs.FSImage.loadFSEdits(FSImage.java:846)
                  at 
          org.apache.hadoop.dfs.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:567)
                  at 
          org.apache.hadoop.dfs.SecondaryNameNode$CheckpointStorage.access$000(SecondaryNameNode.java:464)
                  at 
          org.apache.hadoop.dfs.SecondaryNameNode.doMerge(SecondaryNameNode.java:341)
                  at 
          org.apache.hadoop.dfs.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:305)
                  at 
          org.apache.hadoop.dfs.SecondaryNameNode.run(SecondaryNameNode.java:216)
                  at java.lang.Thread.run(Thread.java:619)
          
          2009-06-11 07:14:30,842 INFO org.apache.hadoop.dfs.NameNode.Secondary: 
          SHUTDOWN_MSG:
          /************************************************************
          SHUTDOWN_MSG: Shutting down SecondaryNameNode at host/ip
          ************************************************************/
          
          Show
          Raghu Angadi added a comment - Stacktrace of such a failure: 2009-06-11 07:14:30,798 ERROR org.apache.hadoop.dfs.NameNode.Secondary: Throwable Exception in doCheckpoint: 2009-06-11 07:14:30,798 ERROR org.apache.hadoop.dfs.NameNode.Secondary: java.lang.IllegalArgumentException: Illegal group reference at java.util.regex.Matcher.appendReplacement(Matcher.java:713) at java.util.regex.Matcher.replaceFirst(Matcher.java:861) at java.lang.String.replaceFirst(String.java:2147) at org.apache.hadoop.dfs.LeaseManager.changeLease(LeaseManager.java:288) at org.apache.hadoop.dfs.FSNamesystem.changeLease(FSNamesystem.java:4441) at org.apache.hadoop.dfs.FSEditLog.loadFSEdits(FSEditLog.java:563) at org.apache.hadoop.dfs.FSImage.loadFSEdits(FSImage.java:846) at org.apache.hadoop.dfs.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:567) at org.apache.hadoop.dfs.SecondaryNameNode$CheckpointStorage.access$000(SecondaryNameNode.java:464) at org.apache.hadoop.dfs.SecondaryNameNode.doMerge(SecondaryNameNode.java:341) at org.apache.hadoop.dfs.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:305) at org.apache.hadoop.dfs.SecondaryNameNode.run(SecondaryNameNode.java:216) at java.lang.Thread.run(Thread.java:619) 2009-06-11 07:14:30,842 INFO org.apache.hadoop.dfs.NameNode.Secondary: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down SecondaryNameNode at host/ip ************************************************************/

            People

            • Assignee:
              Tsz Wo Nicholas Sze
              Reporter:
              Raghu Angadi
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development