Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-909

Race condition between rollEditLog or rollFSImage ant FSEditsLog.write operations corrupts edits log

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments


    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.20.1, 0.20.2, 0.21.0, 0.22.0
    • Fix Version/s: 0.20.3, 0.21.0
    • Component/s: namenode
    • Labels:
    • Environment:


    • Hadoop Flags:


      closing the edits log file can race with write to edits log file operation resulting in OP_INVALID end-of-file marker being initially overwritten by the concurrent (in setReadyToFlush) threads and then removed twice from the buffer, losing a good byte from edits log.


      FSNameSystem.rollEditLog() -> FSEditLog.divertFileStreams() -> FSEditLog.closeStream() -> EditLogOutputStream.setReadyToFlush()
      FSNameSystem.rollEditLog() -> FSEditLog.divertFileStreams() -> FSEditLog.closeStream() -> EditLogOutputStream.flush() -> EditLogFileOutputStream.flushAndSync()
      FSNameSystem.rollFSImage() -> FSIMage.rollFSImage() -> FSEditLog.purgeEditLog() -> FSEditLog.revertFileStreams() -> FSEditLog.closeStream() ->EditLogOutputStream.setReadyToFlush() 
      FSNameSystem.rollFSImage() -> FSIMage.rollFSImage() -> FSEditLog.purgeEditLog() -> FSEditLog.revertFileStreams() -> FSEditLog.closeStream() ->EditLogOutputStream.flush() -> EditLogFileOutputStream.flushAndSync()
      FSNameSystem.completeFile -> FSEditLog.logSync() -> EditLogOutputStream.setReadyToFlush()
      FSNameSystem.completeFile -> FSEditLog.logSync() -> EditLogOutputStream.flush() -> EditLogFileOutputStream.flushAndSync()
      Any FSEditLog.write

      Access on the edits flush operations is synchronized only in the FSEdits.logSync() method level. However at a lower level access to EditsLogOutputStream setReadyToFlush(), flush() or flushAndSync() is NOT synchronized. These can be called from concurrent threads like in the example above

      So if a rollEditLog or rollFSIMage is happening at the same time with a write operation it can race for EditLogFileOutputStream.setReadyToFlush that will overwrite the the last byte (normally the FSEditsLog.OP_INVALID which is the "end-of-file marker") and then remove it twice (from each thread) in flushAndSync()! Hence there will be a valid byte missing from the edits log that leads to a SecondaryNameNode silent failure and a full HDFS failure upon cluster restart.

      We got to this point after investigating a corrupted edits file that made HDFS unable to start with

      java.io.IOException: Incorrect data format. logVersion is -20 but writables.length is 768. 
              at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadEditRecords(FSEditLog.java:450

      EDIT: moved the logs to a comment to make this readable


        1. ASF.LICENSE.NOT.GRANTED--hdfs-909.txt
          25 kB
          Todd Lipcon
        2. ASF.LICENSE.NOT.GRANTED--hdfs-909-branch-0.20.txt
          25 kB
          Todd Lipcon
        3. hdfs-909.txt
          25 kB
          Todd Lipcon
        4. hdfs-909.txt
          25 kB
          Todd Lipcon
        5. hdfs-909.txt
          12 kB
          Todd Lipcon
        6. hdfs-909.txt
          12 kB
          Todd Lipcon
        7. hdfs-909.txt
          11 kB
          Todd Lipcon
        8. hdfs-909-ammendation.txt
          7 kB
          Todd Lipcon
        9. hdfs-909-branch-0.20.txt
          25 kB
          Todd Lipcon
        10. hdfs-909-branch-0.20.txt
          25 kB
          Todd Lipcon
        11. hdfs-909-branch-0.21.txt
          25 kB
          Konstantin Shvachko
        12. hdfs-909-branch-0.21.txt
          24 kB
          Konstantin Shvachko
        13. hdfs-909-unified.txt
          26 kB
          Todd Lipcon
        14. hdfs-909-unittest.txt
          6 kB
          Todd Lipcon

        Issue Links



            • Assignee:
              tlipcon Todd Lipcon
              clehene Cosmin Lehene


              • Created:

                Issue deployment