[HDFS-2949] HA: Add check to active state transition to prevent operator-induced split brain - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 2.0.0-alpha
Fix Version/s: 2.5.0
Component/s: ha, namenode
Labels:
None

Target Version/s:

2.5.0
Hadoop Flags:

Reviewed

Description

Currently, if the administrator mistakenly calls "-transitionToActive" on one NN while the other one is still active, all hell will break loose. We can add a simple check by having the NN make a getServiceState() RPC to its peer with a short (~1 second?) timeout. If the RPC succeeds and indicates the other node is active, it should refuse to enter active mode. If the RPC fails or indicates standby, it can proceed.

This is just meant as a preventative safety check - we still expect users to use the "-failover" command which has other checks plus fencing built in.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HDFS-2949-v3.patch
14/May/14 15:10
12 kB
Rushabh Shah
HDFS-2949-v2.patch
27/Apr/14 04:12
8 kB
Rushabh Shah
HDFS-2949.patch
25/Apr/14 22:01
8 kB
Rushabh Shah

Issue Links

breaks

YARN-2075 TestRMAdminCLI consistently fail on trunk and branch-2

Closed

duplicates

HDFS-6203 check other namenode's state before HAadmin transitionToActive

Resolved

Activity

People

Assignee:: Rushabh Shah

Reporter:: Todd Lipcon

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 14/Feb/12 21:34

Updated:: 10/Mar/15 04:36

Resolved:: 14/May/14 20:47