Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-12800

Potential disk/block missing when DataNode upgrade with data layout changed

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      During upgrade with a data layout change, we found some disks are not formatted as new layout version, causing some blocks are missing. The root cause is because of race conflict in the doUpgrade process.

      In current DataStorage.java's loadBlockPoolSliceStorage implementation, for each datadir, it will restore trash, generate upgrade task, and execute these tasks at the end of each datadir for-loop.

          for (StorageLocation dataDir : dataDirs) {
            dataDir.makeBlockPoolDir(bpid, null);
            try {
              final List<Callable<StorageDirectory>> callables = Lists.newArrayList();
              final List<StorageDirectory> dirs = bpStorage.recoverTransitionRead(
                  nsInfo, dataDir, startOpt, callables, datanode.getConf());
              if (callables.isEmpty()) {
                ......
              } else {
                for(Callable<StorageDirectory> c : callables) {
                  tasks.add(new UpgradeTask(dataDir, executor.submit(c)));
                }
              }
            } catch (IOException e) {
              ......
            }
          }
      

      Inside the doUpgrade task, it will actually update the layoutVersion variable.

      this.layoutVersion = HdfsServerConstants.DATANODE_LAYOUT_VERSION;
      

      This will break the upgrade task generation for other datadirs (BlockPoolSliceStorage.java). The 2nd if condition will fail, causing some disks are not added to the upgrade task lists. As a results, only part of disks are upgraded to the new layout format, and few are not. Restarting DataNodes will reduce the missing number.

          if (this.layoutVersion > HdfsServerConstants.DATANODE_LAYOUT_VERSION) {
            int restored = restoreBlockFilesFromTrash(getTrashRootDir(sd));
            LOG.info("Restored " + restored + " block files from trash " +
              "before the layout upgrade. These blocks will be moved to " +
              "the previous directory during the upgrade");
          }
          if (this.layoutVersion > HdfsServerConstants.DATANODE_LAYOUT_VERSION
              || this.cTime < nsInfo.getCTime()) {
            doUpgrade(sd, nsInfo, callables, conf); // upgrade
            return true;
          }
      

        Attachments

          Activity

            People

            • Assignee:
              ywskycn Wei Yan
              Reporter:
              ywskycn Wei Yan
            • Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

              • Created:
                Updated: