[AMBARI-19289] HDFS Service check fails if previous active NN is down - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.4.2
Fix Version/s: trunk
Component/s: ambari-server
Labels:
None

Description

Reproduce steps

Enable namenode HA
Shutdown the active namenode, standby takes over
Run HDFS service check

hdfs service check script uses

hdfs dfsadmin -fs hdfs://mycluster -safemode get | grep OFF

to check if namenode is out of safemode. However this command will fail if 1st NN is down without checking the state of 2nd NN. This is likely a HDFS bug similar to HDFS-8277.

Proposal

There are several approaches to fix this

Loop each namenode address and get safemode with hdfs dfsadmin -fs hdfs://nn_host:8020 -safemode get | grep OFF, as long as there is one NN returns OFF, consider DFS is not in safemode and continue the rest of check. However is it really necessary to add such complexity for service check?
Remove the safemode check code, if HDFS is in safemode, read/write operations will fail anyway so service check won't pass

I am preferring to #2 because it makes script simpler and work in all cases. Note this is service check, it should pass as long as HDFS is in working state. It is not namenode check.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

AMBARI-19289_trunk.01.patch
23/Dec/16 07:02
2 kB
Weiwei Yang
AMBARI-19289_trunk.02.patch
10/Jan/17 02:29
5 kB
Weiwei Yang
AMBARI-19289_branch-2.5.01.patch
10/Jan/17 02:34
3 kB
Weiwei Yang

Issue Links

links to

ReviewBoard

Activity

People

Assignee:: Weiwei Yang

Reporter:: Weiwei Yang

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 23/Dec/16 06:45

Updated:: 13/Jan/17 15:55

Resolved:: 13/Jan/17 15:55