Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-903

NN should verify images and edit logs on startup

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • None
    • 0.22.0
    • namenode
    • None
    • Incompatible change, Reviewed
    • Store fsimage MD5 checksum in VERSION file. Validate checksum when loading a fsimage. Layout version bumped.

    Description

      I was playing around with corrupting fsimage and edits logs when there are multiple dfs.name.dirs specified. I noticed that:

      • As long as your corruption does not make the image invalid, eg changes an opcode so it's an invalid opcode HDFS doesn't notice and happily uses a corrupt image or applies the corrupt edit.
      • If the first image in dfs.name.dir is "valid" it replaces the other copies in the other name.dirs, even if they are different, with this first image, ie if the first image is actually invalid/old/corrupt metadata than you've lost your valid metadata, which can result in data loss if the namenode garbage collects blocks that it thinks are no longer used.

      How about we maintain a checksum as part of the image and edit log and check those on startup and refuse to startup if they are different. Or at least provide a configuration option to do so if people are worried about the overhead of maintaining checksums of these files. Even if we assume dfs.name.dir is reliable storage this guards against operator errors.

      Attachments

        1. trunkChecksumImage4.patch
          24 kB
          Hairong Kuang
        2. trunkChecksumImage3.patch
          24 kB
          Hairong Kuang
        3. trunkChecksumImage2.patch
          22 kB
          Hairong Kuang
        4. trunkChecksumImage1.patch
          23 kB
          Hairong Kuang
        5. trunkChecksumImage.patch
          19 kB
          Hairong Kuang

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            hairong Hairong Kuang
            eli Eli Collins
            Votes:
            0 Vote for this issue
            Watchers:
            17 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment