Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-7443

Datanode upgrade to BLOCKID_BASED_LAYOUT fails if duplicate block files are present in the same volume

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • 2.6.0
    • 2.6.1, 3.0.0-alpha1
    • None

    Description

      When we did an upgrade from 2.5 to 2.6 in a medium size cluster, about 4% of datanodes were not coming up. They treid data file layout upgrade for BLOCKID_BASED_LAYOUT introduced in HDFS-6482, but failed.

      All failures were caused by NativeIO.link() throwing IOException saying EEXIST. The data nodes didn't die right away, but the upgrade was soon retried when the block pool initialization was retried whenever BPServiceActor was registering with the namenode. After many retries, datenodes terminated. This would leave previous.tmp and current with no VERSION file in the block pool slice storage directory.

      Although previous.tmp contained the old VERSION file, the content was in the new layout and the subdirs were all newly created ones. This shouldn't have happened because the upgrade-recovery logic in Storage removes current and renames previous.tmp to current before retrying. All successfully upgraded volumes had old state preserved in their previous directory.

      In summary there were two observed issues.

      • Upgrade failure with link() failing with EEXIST
      • previous.tmp contained not the content of original current, but half-upgraded one.

      We did not see this in smaller scale test clusters.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            cmccabe Colin McCabe Assign to me
            kihwal Kihwal Lee
            Votes:
            0 Vote for this issue
            Watchers:
            13 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment