[HDFS-8674] Improve performance of postponed block scans - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: 2.6.0
Fix Version/s: 2.8.0, 3.0.0-alpha2
Component/s: namenode
Labels:
None

Target Version/s:

2.8.0, 3.0.0-alpha2
Hadoop Flags:

Reviewed

Description

When a standby goes active, it marks all nodes as "stale" which will cause block invalidations for over-replicated blocks to be queued until full block reports are received from the nodes with the block. The replication monitor scans the queue with O(N) runtime. It picks a random offset and iterates through the set to randomize blocks scanned.

The result is devastating when a cluster loses multiple nodes during a rolling upgrade. Re-replication occurs, the nodes come back, the excess block invalidations are postponed. Rescanning just 2k blocks out of millions of postponed blocks may take multiple seconds. During the scan, the write lock is held which stalls all other processing.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HDFS-8674.2.patch
29/Nov/16 21:33
7 kB
Daryn Sharp
HDFS-8674.branch-2.patch
01/Dec/16 15:38
7 kB
Daryn Sharp
HDFS-8674.patch
16/Dec/15 20:48
6 kB
Daryn Sharp
HDFS-8674.patch
26/Jun/15 17:39
6 kB
Daryn Sharp
HDFS-8674.trunk.2.patch
01/Dec/16 15:38
8 kB
Daryn Sharp
HDFS-8674.trunk.patch
30/Nov/16 21:03
8 kB
Daryn Sharp
HDFS-8674-branch-2.7.patch
27/May/17 02:05
7 kB
Konstantin Shvachko

Issue Links

relates to

HDFS-11868 Backport HDFS-8674 to branch 2.7

Resolved

Activity

People

Assignee:: Daryn Sharp

Reporter:: Daryn Sharp

Votes:: 0 Vote for this issue

Watchers:: 18 Start watching this issue

Dates

Created:: 26/Jun/15 16:22

Updated:: 27/May/17 02:05

Resolved:: 01/Dec/16 18:22