[HDFS-14333] Datanode fails to start if any disk has errors during Namenode registration - ASF JIRA

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.3.0
Fix Version/s: 3.3.0, 3.2.1, 3.1.3
Component/s: datanode
Labels:
None

Target Version/s:

3.3.0

Description

This is closely related to ~~HDFS-9908~~, where it was reported that a datanode would fail to start if an IO error occurred on a single disk when running du during Datanode registration. That Jira was closed due to ~~HADOOP-12973~~ which refactored how du is called and prevents any exception being thrown. However this problem can still occur if the volume has errors (eg permission or filesystem corruption) when the disk is scanned to load all the replicas. The method chain is:

DataNode.initBlockPool -> FSDataSetImpl.addBlockPool -> FSVolumeList.getAllVolumesMap -> Throws exception which goes unhandled.

The DN logs will contain a stack trace for the problem volume, so the workaround is to remove the volume from the DN config and the DN will start, but the logs are a little confusing, so its always not obvious what the issue is.

These are the cut down logs from an occurrence of this issue.

2019-03-01 08:58:24,830 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Scanning block pool BP-240961797-x.x.x.x-1392827522027 on volume /data/18/dfs/dn/current...
...
2019-03-01 08:58:27,029 WARN org.apache.hadoop.fs.CachingGetSpaceUsed: Could not get disk usage information
ExitCodeException exitCode=1: du: cannot read directory `/data/18/dfs/dn/current/BP-240961797-x.x.x.x-1392827522027/current/finalized/subdir149/subdir215': Permission denied
du: cannot read directory `/data/18/dfs/dn/current/BP-240961797-x.x.x.x-1392827522027/current/finalized/subdir149/subdir213': Permission denied
du: cannot read directory `/data/18/dfs/dn/current/BP-240961797-x.x.x.x-1392827522027/current/finalized/subdir97/subdir25': Permission denied

	at org.apache.hadoop.util.Shell.runCommand(Shell.java:601)
	at org.apache.hadoop.util.Shell.run(Shell.java:504)
	at org.apache.hadoop.fs.DU$DUShell.startRefresh(DU.java:61)
	at org.apache.hadoop.fs.DU.refresh(DU.java:53)
	at org.apache.hadoop.fs.CachingGetSpaceUsed.init(CachingGetSpaceUsed.java:84)
	at org.apache.hadoop.fs.GetSpaceUsed$Builder.build(GetSpaceUsed.java:166)
	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.<init>(BlockPoolSlice.java:145)
	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.addBlockPool(FsVolumeImpl.java:881)
	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList$2.run(FsVolumeList.java:412)
...
2019-03-01 08:58:27,043 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time taken to scan block pool BP-240961797-x.x.x.x-1392827522027 on /data/18/dfs/dn/current: 2202ms

So we can see a du error occurred, was logged but not re-thrown (due to ~~HADOOP-12973~~) and the blockpool scan completed. However then in the 'add replicas to map' logic, we got another exception stemming from the same problem:

2019-03-01 08:58:27,564 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Adding replicas to map for block pool BP-240961797-x.x.x.x-1392827522027 on volume /data/18/dfs/dn/current...
...
2019-03-01 08:58:31,155 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Caught exception while adding replicas from /data/18/dfs/dn/current. Will throw later.
java.io.IOException: Invalid directory or I/O error occurred for dir: /data/18/dfs/dn/current/BP-240961797-x.x.x.x-1392827522027/current/finalized/subdir149/subdir215
	at org.apache.hadoop.fs.FileUtil.listFiles(FileUtil.java:1167)
	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:445)
	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448)
	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448)
	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.getVolumeMap(BlockPoolSlice.java:342)
	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getVolumeMap(FsVolumeImpl.java:861)
	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList$1.run(FsVolumeList.java:191)

< The message 2019-03-01 08:59:00,989 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to add replicas to map for block pool BP-240961797-x.x.x.x-1392827522027 on volume xxx did not appear for this volume as it failed >

The exception is re-thrown, so the DN fails registration and then retries. Then it finds all volumes already locked and exits with a 'all volumes failed' error.

I believe we should handle the failing volume like a runtime volume failure and only abort the DN if too many volumes have failed.

I will post a patch for this.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HDFS-14333.001.patch
04/Mar/19 17:39
6 kB
Stephen O'Donnell
HDFS-14333.002.patch
06/Mar/19 22:18
11 kB
Stephen O'Donnell
HADOOP-16119.poc.patch
07/Mar/19 05:34
10 kB
Wei-Chiu Chuang
HDFS-14333.003.patch
07/Mar/19 17:22
16 kB
Stephen O'Donnell
HDFS-14333.004.patch
08/Mar/19 10:36
16 kB
Stephen O'Donnell
HDFS-14333.005.patch
10/Mar/19 14:27
17 kB
Stephen O'Donnell

Issue Links

relates to

HDFS-14459 ClosedChannelException silently ignored in FsVolumeList.addBlockPool()

Resolved

Datanode fails to start if any disk has errors during Namenode registration

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates