Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-1921

Save namespace can cause NN to be unable to come up on restart

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.22.0, 0.23.0
    • Fix Version/s: 0.22.0, 0.23.0
    • Component/s: None
    • Labels:
      None

      Description

      I discovered this in the course of trying to implement a fix for HDFS-1505.

      Per the comment for FSImage.saveNamespace(...), the algorithm for save namespace proceeds in the following order:

      1. rename current to lastcheckpoint.tmp for all of them,
      2. save image and recreate edits for all of them,
      3. rename lastcheckpoint.tmp to previous.checkpoint.

      The problem is that step 3 occurs regardless of whether or not an error occurs for all storage directories in step 2. Upon restart, the NN will see non-existent or corrupt current directories, and no lastcheckpoint.tmp directories, and so will conclude that the storage directories are not formatted.

      This issue appears to be present on both 0.22 and 0.23. This should arguably be a 0.22/0.23 blocker.

        Attachments

        1. hdfs-1921-2_v22.patch
          5 kB
          Matt Foley
        2. hdfs-1921-2.patch
          5 kB
          Matt Foley
        3. hdfs-1921.txt
          5 kB
          Todd Lipcon
        4. hdfs1921_v23.patch
          3 kB
          Matt Foley
        5. hdfs-1505-1-test.txt
          3 kB
          Matt Foley
        6. hdfs1921_v23.patch
          3 kB
          Matt Foley

          Issue Links

            Activity

              People

              • Assignee:
                mattf Matt Foley
                Reporter:
                atm Aaron T. Myers
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: