Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-7443

Datanode upgrade to BLOCKID_BASED_LAYOUT fails if duplicate block files are present in the same volume

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 2.6.0
    • Fix Version/s: 2.6.1, 3.0.0-alpha1
    • Component/s: None
    • Labels:
    • Target Version/s:

      Description

      When we did an upgrade from 2.5 to 2.6 in a medium size cluster, about 4% of datanodes were not coming up. They treid data file layout upgrade for BLOCKID_BASED_LAYOUT introduced in HDFS-6482, but failed.

      All failures were caused by NativeIO.link() throwing IOException saying EEXIST. The data nodes didn't die right away, but the upgrade was soon retried when the block pool initialization was retried whenever BPServiceActor was registering with the namenode. After many retries, datenodes terminated. This would leave previous.tmp and current with no VERSION file in the block pool slice storage directory.

      Although previous.tmp contained the old VERSION file, the content was in the new layout and the subdirs were all newly created ones. This shouldn't have happened because the upgrade-recovery logic in Storage removes current and renames previous.tmp to current before retrying. All successfully upgraded volumes had old state preserved in their previous directory.

      In summary there were two observed issues.

      • Upgrade failure with link() failing with EEXIST
      • previous.tmp contained not the content of original current, but half-upgraded one.

      We did not see this in smaller scale test clusters.

        Attachments

        1. HDFS-7443.002.patch
          1.57 MB
          Colin P. McCabe
        2. HDFS-7443.001.patch
          1.57 MB
          Colin P. McCabe

          Issue Links

            Activity

              People

              • Assignee:
                cmccabe Colin P. McCabe
                Reporter:
                kihwal Kihwal Lee
              • Votes:
                0 Vote for this issue
                Watchers:
                14 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: