[HDFS-6507] Improve DFSAdmin to support HA cluster better - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 2.4.0
Fix Version/s: 2.5.0
Component/s: tools
Labels:
None

Target Version/s:

2.5.0
Hadoop Flags:

Reviewed
Tags:
dfsadmin

Description

Currently, the commands supported in DFSAdmin can be classified into three categories according to the protocol used:
1. ClientProtocol
Commands in this category generally implement by calling the corresponding function of the DFSClient class, and will call the corresponding remote implementation function at the NN side finally. At the NN side, all these operations are classified into five categories: UNCHECKED, READ, WRITE, CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only allows UNCHECKED operations. In the current implementation of DFSClient, it will connect one NN first, if the first NN is not Active and the operation is not allowed, it will failover to the second NN. So here comes the problem, some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as UNCHECKED operations, and when executing these commands in the DFSAdmin command line, they will be sent to a definite NN, no matter it is Active or Standby. This may result in two problems:
a. If the first tried NN is standby, and the operation takes effect only on Standby NN, which is not the expected result.
b. If the operation needs to take effect on both NN, but it takes effect on only one NN. In the future, when there is a NN failover, there may have problems.

Here I propose the following improvements:
a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL operations, we should classify it clearly.
b. If the command can not be classified as one of the above four operations, or if the command needs to take effect on both NN, we should send the request to both Active and Standby NNs.

2. Refresh protocols: RefreshAuthorizationPolicyProtocol, RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, RefreshCallQueueProtocol
Commands in this category, including refreshServiceAcl, refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and refreshCallQueue, are implemented by creating a corresponding RPC proxy and sending the request to remote NN. In the current implementation, these requests will be sent to a definite NN, no matter it is Active or Standby. Here I propose that we sent these requests to both NNs.

3. ClientDatanodeProtocol
Commands in this category are handled correctly, no need to improve.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HDFS-6507.1.patch
13/Jun/14 07:58
52 kB
Zesheng Wu
HDFS-6507.2.patch
13/Jun/14 15:50
27 kB
Zesheng Wu
HDFS-6507.3.patch
17/Jun/14 09:34
30 kB
Zesheng Wu
HDFS-6507.4.patch
17/Jun/14 12:18
32 kB
Zesheng Wu
HDFS-6507.4-inprogress.patch
17/Jun/14 12:22
25 kB
Vinayakumar B
HDFS-6507.5.patch
17/Jun/14 13:10
33 kB
Zesheng Wu
HDFS-6507.6.patch
17/Jun/14 16:00
33 kB
Zesheng Wu
HDFS-6507.7.patch
18/Jun/14 00:20
33 kB
Zesheng Wu
HDFS-6507.7.patch
18/Jun/14 03:43
33 kB
Zesheng Wu
HDFS-6507.8.patch
18/Jun/14 06:02
33 kB
Zesheng Wu

Issue Links

breaks

HDFS-6789 TestDFSClientFailover.testFileContextDoesntDnsResolveLogicalURI and TestDFSClientFailover.testDoesntDnsResolveLogicalURI failing on jdk7

Closed

incorporates

HDFS-3744 Decommissioned nodes are included in cluster after switch which is not expected

Resolved

is related to

HDFS-6693 TestDFSAdminWithHA fails on windows

Closed

HDFS-8277 Safemode enter fails when Standby NameNode is down

Patch Available

Activity

People

Assignee:: Zesheng Wu

Reporter:: Zesheng Wu

Votes:: 0 Vote for this issue

Watchers:: 13 Start watching this issue

Dates

Created:: 10/Jun/14 07:19

Updated:: 29/Apr/15 04:12

Resolved:: 23/Jun/14 05:21