[HDFS-13768] Adding replicas to volume map makes DataNode start slowly - ASF JIRA

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.1.0
Fix Version/s: 2.10.0, 3.2.0, 3.1.2
Component/s: None
Labels:
None

Target Version/s:

2.10.0, 3.2.0, 3.1.2
Hadoop Flags:

Reviewed

Description

We find DN starting so slowly when rolling upgrade our cluster. When we restart DNs, the DNs start so slowly and not register to NN immediately. And this cause a lots of following error:

DataXceiver error processing WRITE_BLOCK operation  src: /xx.xx.xx.xx:64360 dst: /xx.xx.xx.xx:50010
java.io.IOException: Not ready to serve the block pool, BP-1508644862-xx.xx.xx.xx-1493781183457.
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAndWaitForBP(DataXceiver.java:1290)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.checkAccess(DataXceiver.java:1298)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:630)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246)
        at java.lang.Thread.run(Thread.java:745)

Looking into the logic of DN startup, it will do the initial block pool operation before the registration. And during initializing block pool operation, we found the adding replicas to volume map is the most expensive operation. Related log:

2018-07-26 10:46:23,771 INFO [Thread-105] org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on volume /home/hard_disk/1/dfs/dn/current: 242722ms
2018-07-26 10:46:26,231 INFO [Thread-109] org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on volume /home/hard_disk/5/dfs/dn/current: 245182ms
2018-07-26 10:46:32,146 INFO [Thread-112] org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on volume /home/hard_disk/8/dfs/dn/current: 251097ms
2018-07-26 10:47:08,283 INFO [Thread-106] org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Time to add replicas to map for block pool BP-1508644862-xx.xx.xx.xx-1493781183457 on volume /home/hard_disk/2/dfs/dn/current: 287235ms

Currently DN uses independent thread to scan and add replica for each volume, but we still need to wait the slowest thread to finish its work. So the main problem here is that we could make the thread to run faster.

The jstack we get when DN blocking in the adding replica:

"Thread-113" #419 daemon prio=5 os_prio=0 tid=0x00007f40879ff000 nid=0x145da runnable [0x00007f4043a38000]
   java.lang.Thread.State: RUNNABLE
	at java.io.UnixFileSystem.list(Native Method)
	at java.io.File.list(File.java:1122)
	at java.io.File.listFiles(File.java:1207)
	at org.apache.hadoop.fs.FileUtil.listFiles(FileUtil.java:1165)
	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:445)
	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448)
	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(BlockPoolSlice.java:448)
	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.getVolumeMap(BlockPoolSlice.java:342)
	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getVolumeMap(FsVolumeImpl.java:864)
	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList$1.run(FsVolumeList.java:191)

One improvement maybe we can use ForkJoinPool to do this recursive task, rather than a sync way. This will be a great improvement because it can greatly speed up recovery process.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HDFS-13768.patch
01/Sep/18 08:00
4 kB
Ranith Sardar
HDFS-13768.01.patch
11/Sep/18 02:54
9 kB
Surendra Singh Lilhore
HDFS-13768.02.patch
12/Sep/18 11:08
9 kB
Surendra Singh Lilhore
screenshot-1.png
18/Sep/18 07:46
32 kB
Surendra Singh Lilhore
HDFS-13768.03.patch
22/Sep/18 16:52
21 kB
Surendra Singh Lilhore
HDFS-13768.04.patch
23/Sep/18 03:17
21 kB
Surendra Singh Lilhore
HDFS-13768.05.patch
26/Sep/18 12:30
23 kB
Surendra Singh Lilhore
HDFS-13768.06.patch
27/Sep/18 14:53
23 kB
Surendra Singh Lilhore
HDFS-13768.07.patch
28/Sep/18 03:34
23 kB
Yiqun Lin
HDFS-13768.01-branch-2.patch
04/Oct/18 10:31
25 kB
Surendra Singh Lilhore
HDFS-13768-branch-2.01.patch
05/Oct/18 08:10
25 kB
Yiqun Lin
HDFS-13768-branch-2.02.patch
07/Oct/18 19:47
25 kB
Surendra Singh Lilhore
HDFS-13768-branch-2.03.patch
08/Oct/18 09:40
25 kB
Surendra Singh Lilhore

Issue Links

causes

HDFS-14251 BlockPoolSlice ForkJoinPool introduced by HDFS-13768 breaks under Java SecurityManager

Open

is related to

HDFS-13962 Add null check for add-replica pool to avoid lock acquiring

Resolved

SOLR-9515 Update to Hadoop 3

Closed

Adding replicas to volume map makes DataNode start slowly

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates