Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-15838

Copy files from SFTP to HDFS using DistCp failed with error

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 2.5.0, 2.7.2
    • Fix Version/s: 2.7.5
    • Component/s: tools/distcp
    • Environment:

      Hadoop 2.5.0 + kerberos

    • Target Version/s:

      Description

      1. When I run command:

      hadoop distcp sftp://mysftp:1qaz_@WSX@192.168.1.44:/upload/hosts /tmp/JOY

       

      I got error like:

       

      2018-10-10 22:31:37,799 INFO util.KerberosUtil: Using principal pattern: HTTP/_HOST
      2018-10-10 22:31:39,055 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[sftp://mysftp:1qaz_@WSX@192.168.1.44:/upload/hosts], targetPath=/tmp/JOY, targetPathExists=false}
      2018-10-10 22:31:39,365 ERROR tools.DistCp: Exception encountered
      java.io.IOException: Invalid host specified
              at org.apache.hadoop.fs.sftp.SFTPFileSystem.initialize(SFTPFileSystem.java:67)
              at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
              at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)
              at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2643)
              at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2625)
              at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)
              at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
              at org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:76)
              at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84)
              at org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:353)
              at org.apache.hadoop.tools.DistCp.execute(DistCp.java:160)
              at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
              at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
              at org.apache.hadoop.tools.DistCp.main(DistCp.java:401)
      

       

      2. When I run command:

      hadoop distcp sftp://mysftp:1qaz_%40WSX@192.168.1.44:/upload/hosts /tmp/JOY

      I got error like:

      2018-10-10 22:31:59,909 INFO util.KerberosUtil: Using principal pattern: HTTP/_HOST
      
      2018-10-10 22:32:01,286 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[sftp://mysftp:1qaz_%40WSX@192.168.1.44:/upload/hosts], targetPath=/tmp/JOY, targetPathExists=false}
      
      2018-10-10 22:32:02,190 ERROR tools.DistCp: Exception encountered
      
      java.io.IOException: SSH_MSG_DISCONNECT: 2 Too many authentication failures for mysftp
      
              at org.apache.hadoop.fs.sftp.SFTPFileSystem.connect(SFTPFileSystem.java:143)
      
              at org.apache.hadoop.fs.sftp.SFTPFileSystem.getFileStatus(SFTPFileSystem.java:371)
      
              at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
      
              at org.apache.hadoop.fs.Globber.glob(Globber.java:252)
      
              at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1623)
      
              at org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)
      
              at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84)
      
              at org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:353)
      
              at org.apache.hadoop.tools.DistCp.execute(DistCp.java:160)
      
              at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
      
              at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
      
              at org.apache.hadoop.tools.DistCp.main(DistCp.java:401)

      The SFTP username is mysftp

      password is 1qaz_@WSX

       

        Attachments

        1. ERROR2_with_hex_passwd.png
          61 kB
          LinJi
        2. ERROR1_with_orignal_passwd.png
          64 kB
          LinJi

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              J.Lin LinJi
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:

                Time Tracking

                Estimated:
                Original Estimate - 96h
                96h
                Remaining:
                Remaining Estimate - 96h
                96h
                Logged:
                Time Spent - Not Specified
                Not Specified