Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-9868

Add ability for DistCp to run between 2 clusters

Add voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • 2.7.1
    • None
    • distcp
    • None

    Description

      Normally the HDFS cluster is HA enabled. It could take a long time when coping huge data by distp. If the source cluster changes active namenode, the distp will run failed. This patch supports the DistCp can read source cluster files in HA access mode. A source cluster configuration file needs to be specified (via the -sourceClusterConf option).

      The following is an example of the contents of a source cluster configuration
      file:

          <configuration>
            <property>
      		<name>fs.defaultFS</name>
      		<value>hdfs://mycluster</value>
      	  </property>
      	  <property>
      		<name>dfs.nameservices</name>
      		<value>mycluster</value>
      	  </property>
      	  <property>
      		<name>dfs.ha.namenodes.mycluster</name>
      		<value>nn1,nn2</value>
      	  </property>
      	  <property>
      		<name>dfs.namenode.rpc-address.mycluster.nn1</name>
      		<value>host1:9000</value>
      	  </property>
      	  <property>
      		<name>dfs.namenode.rpc-address.mycluster.nn2</name>
      		<value>host2:9000</value>
      	  </property>
      	  <property>
      		<name>dfs.namenode.http-address.mycluster.nn1</name>
      		<value>host1:50070</value>
      	  </property>
      	  <property>
      		<name>dfs.namenode.http-address.mycluster.nn2</name>
      		<value>host2:50070</value>
      	  </property>
      	  <property>
      		<name>dfs.client.failover.proxy.provider.mycluster</name>
      		<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
      	  </property>
      	</configuration>
      

      The invocation of DistCp is as below:

          bash$ hadoop distcp -sourceClusterConf sourceCluster.xml /foo/bar hdfs://nn2:8020/bar/foo
      

      Attachments

        1. HDFS-9868.4.patch
          26 kB
          NING DING
        2. HDFS-9868.3.patch
          26 kB
          NING DING
        3. HDFS-9868.2.patch
          25 kB
          NING DING
        4. HDFS-9868.10.patch
          49 kB
          Xiao Chen
        5. HDFS-9868.1.patch
          24 kB
          NING DING
        6. HDFS-9868.09.patch
          44 kB
          Xiao Chen
        7. HDFS-9868.08.patch
          40 kB
          Xiao Chen
        8. HDFS-9868.07.patch
          34 kB
          Xiao Chen
        9. HDFS-9868.06.patch
          34 kB
          Xiao Chen
        10. HDFS-9868.05.patch
          29 kB
          Xiao Chen

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            iceberg565 NING DING
            iceberg565 NING DING

            Dates

              Created:
              Updated:

              Slack

                Issue deployment