[HDFS-12645] FSDatasetImpl lock will stall BP service actors and may cause missing blocks - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 2.8.0
Fix Version/s: None
Component/s: datanode
Labels:
None

Description

The DN is extremely susceptible to a slow volume due bad locking practices. DN operations require a fs dataset lock. IO in the dataset lock should not be permissible as it leads to severe performance degradation and possibly (temporarily) missing blocks.

A slow disk will cause pipelines to experience significant latency and timeouts, increasing lock/io contention while cleaning up, leading to more timeouts, etc. Meanwhile, the actor service thread is interleaving multiple lock acquire/releases with xceivers. If many commands are issued, the node may be incorrectly declared as dead.

HDFS-12639 documents that both actors synchronize on the offer service lock while processing commands. A backlogged active actor will block the standby actor and cause it to go dead too.

Attachments

Issue Links

is related to

HDFS-12639 BPOfferService lock may stall all service actors

Open

Sub-Tasks

1.	Avoid IO while holding the FsDataset lock	Open	Unassigned
2.	DN commands processing should be async	Open	Nandakumar
3.	DN should provide feedback to NN for throttling commands	Open	Hanisha Koneru

Activity

People

Assignee:: Unassigned

Reporter:: Daryn Sharp

Votes:: 0 Vote for this issue

Watchers:: 13 Start watching this issue

Dates

Created:: 12/Oct/17 14:22

Updated:: 12/Oct/17 14:56