Details
-
Improvement
-
Status: Patch Available
-
Major
-
Resolution: Unresolved
-
2.7.1
-
None
-
None
Description
Normally the HDFS cluster is HA enabled. It could take a long time when coping huge data by distp. If the source cluster changes active namenode, the distp will run failed. This patch supports the DistCp can read source cluster files in HA access mode. A source cluster configuration file needs to be specified (via the -sourceClusterConf option).
The following is an example of the contents of a source cluster configuration
file:
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://mycluster</value> </property> <property> <name>dfs.nameservices</name> <value>mycluster</value> </property> <property> <name>dfs.ha.namenodes.mycluster</name> <value>nn1,nn2</value> </property> <property> <name>dfs.namenode.rpc-address.mycluster.nn1</name> <value>host1:9000</value> </property> <property> <name>dfs.namenode.rpc-address.mycluster.nn2</name> <value>host2:9000</value> </property> <property> <name>dfs.namenode.http-address.mycluster.nn1</name> <value>host1:50070</value> </property> <property> <name>dfs.namenode.http-address.mycluster.nn2</name> <value>host2:50070</value> </property> <property> <name>dfs.client.failover.proxy.provider.mycluster</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> </configuration>
The invocation of DistCp is as below:
bash$ hadoop distcp -sourceClusterConf sourceCluster.xml /foo/bar hdfs://nn2:8020/bar/foo
Attachments
Attachments
Issue Links
- is related to
-
HDFS-6376 Distcp data between two HA clusters requires another configuration
- Closed