Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-1921

Save namespace can cause NN to be unable to come up on restart

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • 0.22.0, 0.23.0
    • 0.22.0, 0.23.0
    • None
    • None

    Description

      I discovered this in the course of trying to implement a fix for HDFS-1505.

      Per the comment for FSImage.saveNamespace(...), the algorithm for save namespace proceeds in the following order:

      1. rename current to lastcheckpoint.tmp for all of them,
      2. save image and recreate edits for all of them,
      3. rename lastcheckpoint.tmp to previous.checkpoint.

      The problem is that step 3 occurs regardless of whether or not an error occurs for all storage directories in step 2. Upon restart, the NN will see non-existent or corrupt current directories, and no lastcheckpoint.tmp directories, and so will conclude that the storage directories are not formatted.

      This issue appears to be present on both 0.22 and 0.23. This should arguably be a 0.22/0.23 blocker.

      Attachments

        1. hdfs1921_v23.patch
          3 kB
          Matthew Foley
        2. hdfs-1505-1-test.txt
          3 kB
          Matthew Foley
        3. hdfs1921_v23.patch
          3 kB
          Matthew Foley
        4. hdfs-1921.txt
          5 kB
          Todd Lipcon
        5. hdfs-1921-2.patch
          5 kB
          Matthew Foley
        6. hdfs-1921-2_v22.patch
          5 kB
          Matthew Foley

        Issue Links

          Activity

            People

              mattf Matthew Foley
              atm Aaron Myers
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: