[HDFS-8676] Delayed rolling upgrade finalization can cause heartbeat expiration and write failures - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Critical
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.8.0, 2.7.2, 3.0.0-alpha1
Component/s: None
Labels:
None

Target Version/s:

2.7.2, 2.6.5
Hadoop Flags:

Reviewed

Description

In big busy clusters where the deletion rate is also high, a lot of blocks can pile up in the datanode trash directories until an upgrade is finalized. When it is finally finalized, the deletion of trash is done in the service actor thread's context synchronously. This blocks the heartbeat and can cause heartbeat expiration.

We have seen a namenode losing hundreds of nodes after a delayed upgrade finalization. The deletion of trash directories should be made asynchronous.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HDFS-8676.01.patch
29/Sep/15 06:20
3 kB
Walter Su
HDFS-8676.02.patch
09/Oct/15 08:30
3 kB
Walter Su

Issue Links

depends upon

HDFS-7645 Rolling upgrade is restoring blocks from trash multiple times

Closed

Activity

People

Assignee:: Walter Su

Reporter:: Kihwal Lee

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 26/Jun/15 16:32

Updated:: 06/Jan/17 00:51

Resolved:: 13/Oct/15 18:26