[HDFS-7645] Rolling upgrade is restoring blocks from trash multiple times - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 2.6.0
Fix Version/s: 2.8.0, 2.7.2, 3.0.0-alpha1
Component/s: datanode
Labels:
None

Hadoop Flags:

Reviewed

Description

When performing an HDFS rolling upgrade, the trash directory is getting restored twice when under normal circumstances it shouldn't need to be restored at all. iiuc, the only time these blocks should be restored is if we need to rollback a rolling upgrade.

On a busy cluster, this can cause significant and unnecessary block churn both on the datanodes, and more importantly in the namenode.

The two times this happens are:
1) restart of DN onto new software

  private void doTransition(DataNode datanode, StorageDirectory sd,
      NamespaceInfo nsInfo, StartupOption startOpt) throws IOException {
    if (startOpt == StartupOption.ROLLBACK && sd.getPreviousDir().exists()) {
      Preconditions.checkState(!getTrashRootDir(sd).exists(),
          sd.getPreviousDir() + " and " + getTrashRootDir(sd) + " should not " +
          " both be present.");
      doRollback(sd, nsInfo); // rollback if applicable
    } else {
      // Restore all the files in the trash. The restored files are retained
      // during rolling upgrade rollback. They are deleted during rolling
      // upgrade downgrade.
      int restored = restoreBlockFilesFromTrash(getTrashRootDir(sd));
      LOG.info("Restored " + restored + " block files from trash.");
    }

2) When heartbeat response no longer indicates a rollingupgrade is in progress

  /**
   * Signal the current rolling upgrade status as indicated by the NN.
   * @param inProgress true if a rolling upgrade is in progress
   */
  void signalRollingUpgrade(boolean inProgress) throws IOException {
    String bpid = getBlockPoolId();
    if (inProgress) {
      dn.getFSDataset().enableTrash(bpid);
      dn.getFSDataset().setRollingUpgradeMarker(bpid);
    } else {
      dn.getFSDataset().restoreTrash(bpid);
      dn.getFSDataset().clearRollingUpgradeMarker(bpid);
    }
  }

~~HDFS-6800~~ and ~~HDFS-6981~~ were modifying this behavior making it not completely clear whether this is somehow intentional.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HDFS-7645.01.patch
24/Feb/15 02:24
1 kB
Keisuke Ogiwara
HDFS-7645.02.patch
04/Mar/15 10:16
8 kB
Keisuke Ogiwara
HDFS-7645.03.patch
05/Mar/15 10:24
6 kB
Keisuke Ogiwara
HDFS-7645.04.patch
17/Mar/15 06:14
7 kB
Keisuke Ogiwara
HDFS-7645.05.patch
17/Mar/15 17:00
18 kB
Vinayakumar B
HDFS-7645.06.patch
27/Mar/15 10:28
22 kB
Vinayakumar B
HDFS-7645.07.patch
30/Mar/15 11:30
22 kB
Vinayakumar B

Issue Links

breaks

HDFS-9426 Rollingupgrade finalization is not backward compatible

Closed

is depended upon by

HDFS-8676 Delayed rolling upgrade finalization can cause heartbeat expiration and write failures

Closed

is related to

HDFS-8656 Preserve compatibility of ClientProtocol#rollingUpgrade after finalization

Closed

relates to

HDFS-7842 Blocks missed while performing downgrade immediately after rolling back the cluster.

Resolved

Activity

People

Assignee:: Keisuke Ogiwara

Reporter:: Nathan Roberts

Votes:: 0 Vote for this issue

Watchers:: 17 Start watching this issue

Dates

Created:: 20/Jan/15 21:12

Updated:: 01/Dec/16 23:26

Resolved:: 30/Mar/15 22:27