[HDFS-7714] Simultaneous restart of HA NameNodes and DataNode can cause DataNode to register successfully with only one NameNode. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 2.6.0
Fix Version/s: 2.7.0, 2.6.1, 3.0.0-alpha1
Component/s: datanode
Labels:
- 2.6.1-candidate

Target Version/s:

2.7.0
Hadoop Flags:

Reviewed

Description

In an HA deployment, DataNodes must register with both NameNodes and send periodic heartbeats and block reports to both. However, if NameNodes and DataNodes are restarted simultaneously, then this can trigger a race condition in registration. The end result is that the BPServiceActor for one NameNode terminates, but the BPServiceActor for the other NameNode remains alive. The DataNode process is then in a "half-alive" state where it only heartbeats and sends block reports to one of the NameNodes. This could cause a loss of storage capacity after an HA failover. The DataNode process would have to be restarted to resolve this.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HDFS-7714-002.patch
09/Feb/15 05:13
1 kB
Vinayakumar B
HDFS-7714-001.patch
01/Feb/15 12:51
2 kB
Vinayakumar B

Issue Links

is related to

HDFS-2882 DN continues to start up, even if block pool fails to initialize

Closed

HDFS-7009 Active NN and standby NN have different live nodes

Closed

Activity

People

Assignee:: Vinayakumar B

Reporter:: Chris Nauroth

Votes:: 0 Vote for this issue

Watchers:: 14 Start watching this issue

Dates

Created:: 30/Jan/15 21:05

Updated:: 30/Aug/16 01:39

Resolved:: 10/Feb/15 05:17