Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-9868

Add ability for DistCp to run between 2 clusters

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Patch Available
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 2.7.1
    • Fix Version/s: None
    • Component/s: distcp
    • Labels:
      None

      Description

      Normally the HDFS cluster is HA enabled. It could take a long time when coping huge data by distp. If the source cluster changes active namenode, the distp will run failed. This patch supports the DistCp can read source cluster files in HA access mode. A source cluster configuration file needs to be specified (via the -sourceClusterConf option).

      The following is an example of the contents of a source cluster configuration
      file:

          <configuration>
            <property>
      		<name>fs.defaultFS</name>
      		<value>hdfs://mycluster</value>
      	  </property>
      	  <property>
      		<name>dfs.nameservices</name>
      		<value>mycluster</value>
      	  </property>
      	  <property>
      		<name>dfs.ha.namenodes.mycluster</name>
      		<value>nn1,nn2</value>
      	  </property>
      	  <property>
      		<name>dfs.namenode.rpc-address.mycluster.nn1</name>
      		<value>host1:9000</value>
      	  </property>
      	  <property>
      		<name>dfs.namenode.rpc-address.mycluster.nn2</name>
      		<value>host2:9000</value>
      	  </property>
      	  <property>
      		<name>dfs.namenode.http-address.mycluster.nn1</name>
      		<value>host1:50070</value>
      	  </property>
      	  <property>
      		<name>dfs.namenode.http-address.mycluster.nn2</name>
      		<value>host2:50070</value>
      	  </property>
      	  <property>
      		<name>dfs.client.failover.proxy.provider.mycluster</name>
      		<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
      	  </property>
      	</configuration>
      

      The invocation of DistCp is as below:

          bash$ hadoop distcp -sourceClusterConf sourceCluster.xml /foo/bar hdfs://nn2:8020/bar/foo
      

        Attachments

        1. HDFS-9868.1.patch
          24 kB
          NING DING
        2. HDFS-9868.2.patch
          25 kB
          NING DING
        3. HDFS-9868.3.patch
          26 kB
          NING DING
        4. HDFS-9868.4.patch
          26 kB
          NING DING
        5. HDFS-9868.05.patch
          29 kB
          Xiao Chen
        6. HDFS-9868.06.patch
          34 kB
          Xiao Chen
        7. HDFS-9868.07.patch
          34 kB
          Xiao Chen
        8. HDFS-9868.08.patch
          40 kB
          Xiao Chen
        9. HDFS-9868.09.patch
          44 kB
          Xiao Chen
        10. HDFS-9868.10.patch
          49 kB
          Xiao Chen

          Issue Links

            Activity

              People

              • Assignee:
                iceberg565 NING DING
                Reporter:
                iceberg565 NING DING
              • Votes:
                1 Vote for this issue
                Watchers:
                12 Start watching this issue

                Dates

                • Created:
                  Updated: