Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
During upgrade with a data layout change, we found some disks are not formatted as new layout version, causing some blocks are missing. The root cause is because of race conflict in the doUpgrade process.
In current DataStorage.java's loadBlockPoolSliceStorage implementation, for each datadir, it will restore trash, generate upgrade task, and execute these tasks at the end of each datadir for-loop.
for (StorageLocation dataDir : dataDirs) { dataDir.makeBlockPoolDir(bpid, null); try { final List<Callable<StorageDirectory>> callables = Lists.newArrayList(); final List<StorageDirectory> dirs = bpStorage.recoverTransitionRead( nsInfo, dataDir, startOpt, callables, datanode.getConf()); if (callables.isEmpty()) { ...... } else { for(Callable<StorageDirectory> c : callables) { tasks.add(new UpgradeTask(dataDir, executor.submit(c))); } } } catch (IOException e) { ...... } }
Inside the doUpgrade task, it will actually update the layoutVersion variable.
this.layoutVersion = HdfsServerConstants.DATANODE_LAYOUT_VERSION;
This will break the upgrade task generation for other datadirs (BlockPoolSliceStorage.java). The 2nd if condition will fail, causing some disks are not added to the upgrade task lists. As a results, only part of disks are upgraded to the new layout format, and few are not. Restarting DataNodes will reduce the missing number.
if (this.layoutVersion > HdfsServerConstants.DATANODE_LAYOUT_VERSION) { int restored = restoreBlockFilesFromTrash(getTrashRootDir(sd)); LOG.info("Restored " + restored + " block files from trash " + "before the layout upgrade. These blocks will be moved to " + "the previous directory during the upgrade"); } if (this.layoutVersion > HdfsServerConstants.DATANODE_LAYOUT_VERSION || this.cTime < nsInfo.getCTime()) { doUpgrade(sd, nsInfo, callables, conf); // upgrade return true; }