Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-3724

Namenode does not start due to exception throw while saving Image

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • 0.18.0
    • 0.18.0
    • None
    • None
    • Reviewed

    Description

      Re-start of namenode failed with this stack trace while savingImage during initialization

      2008-07-09 00:20:21,470 INFO org.apache.hadoop.ipc.Server: Stopping server on 9000
      2008-07-09 00:20:21,493 ERROR org.apache.hadoop.dfs.NameNode: java.io.IOException: saveLeases found path /foo/bar/jambajuice but no matching entry in namespace.  
      at org.apache.hadoop.dfs.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:4376)  
      at org.apache.hadoop.dfs.FSImage.saveFSImage(FSImage.java:874)  
      at org.apache.hadoop.dfs.FSImage.saveFSImage(FSImage.java:892)  
      at org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:81)   
      at org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:273)   
      at org.apache.hadoop.dfs.FSNamesystem.<init>(FSNamesystem.java:252)   
      at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:148)   
      at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:193)   
      at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:179)   
      at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:822)  
      at org.apache.hadoop.dfs.NameNode.main(NameNode.java:831)
      

      Looks like it was throwing IOException in saveFilesUnderConstruction

      Before restart NameNode was killed while some jobs were running. Upon looking at the namenode log before the stopping of namenode, there were many entries like this

      2008-07-09 00:12:55,301 INFO org.apache.hadoop.fs.FSNamesystem: Recovering lease=[Lease.  Holder: DFSClient_-510679348, pendingcreates: 1], src=/foo/bar/jambajuice
      2008-07-09 00:12:55,301 WARN org.apache.hadoop.dfs.StateChange: DIR* NameSystem.internalReleaseCreate: attempt to release a create lock on /foo/bar/jambajuice  file does not exist.
      

      These 2 lines are repeated forever every second, to a point where I see that a 7 node cluster had namenode log with size close to 41G.

      Could not find any other information about the file as there were not previous namenode logs.

      Attachments

        1. renameWhileOpen3.patch
          18 kB
          Dhruba Borthakur
        2. renameWhileOpen2.patch
          18 kB
          Dhruba Borthakur
        3. renameWhileOpen.patch
          13 kB
          Dhruba Borthakur

        Issue Links

          Activity

            People

              dhruba Dhruba Borthakur
              lohit Lohit Vijaya Renu
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: