Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-10007

distcp / mv is not working on ftp

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      i'm just trying to backup some files to our ftp-server.

      hadoop distcp hdfs:///data/ ftp://user:pass@server/data/

      returns after some minutes with:

      Task TASKID="task_201308231529_97700_m_000002" TASK_TYPE="MAP" TASK_STATUS="FAILED" FINISH_TIME="1380217916479" ERROR="java\.io\.IOException: Cannot rename parent(source): ftp://x:x@backup2/data/, parent(destination): ftp://x:x@backup2/data/
      at org\.apache\.hadoop\.fs\.ftp\.FTPFileSystem\.rename(FTPFileSystem\.java:557)
      at org\.apache\.hadoop\.fs\.ftp\.FTPFileSystem\.rename(FTPFileSystem\.java:522)
      at org\.apache\.hadoop\.mapred\.FileOutputCommitter\.moveTaskOutputs(FileOutputCommitter\.java:154)
      at org\.apache\.hadoop\.mapred\.FileOutputCommitter\.moveTaskOutputs(FileOutputCommitter\.java:172)
      at org\.apache\.hadoop\.mapred\.FileOutputCommitter\.commitTask(FileOutputCommitter\.java:132)
      at org\.apache\.hadoop\.mapred\.OutputCommitter\.commitTask(OutputCommitter\.java:221)
      at org\.apache\.hadoop\.mapred\.Task\.commit(Task\.java:1000)
      at org\.apache\.hadoop\.mapred\.Task\.done(Task\.java:870)
      at org\.apache\.hadoop\.mapred\.MapTask\.run(MapTask\.java:329)
      at org\.apache\.hadoop\.mapred\.Child$4\.run" TASK_ATTEMPT_ID="" .

      I googled a bit and added

      fs.ftp.host = backup2
      fs.ftp.user.backup2 = user
      fs.ftp.password.backup2 = password

      to core-site.xml, then I was able to execute:

      hadoop fs -ls ftp:///data/
      hadoop fs -rm ftp:///data/test.file

      but as soon as I try

      hadoop fs -mv file:///data/test.file ftp:///data/test2.file
      mv: `ftp:///data/test.file': Input/output error

      I enabled debug-logging in our ftp-server and got:

      Sep 27 15:24:33 backup2 ftpd[38241]: command: LIST /data
      Sep 27 15:24:33 backup2 ftpd[38241]: <--- 150
      Sep 27 15:24:33 backup2 ftpd[38241]: Opening BINARY mode data connection for '/bin/ls'.
      Sep 27 15:24:33 backup2 ftpd[38241]: <--- 226
      Sep 27 15:24:33 backup2 ftpd[38241]: Transfer complete.
      Sep 27 15:24:33 backup2 ftpd[38241]: command: CWD ftp:/data
      Sep 27 15:24:33 backup2 ftpd[38241]: <--- 550
      Sep 27 15:24:33 backup2 ftpd[38241]: ftp:/data: No such file or directory.
      Sep 27 15:24:33 backup2 ftpd[38241]: command: RNFR test.file
      Sep 27 15:24:33 backup2 ftpd[38241]: <--- 550

      looks like the generation of "CWD" is buggy, hadoop tries to cd into "ftp:/data", but should use "/data"

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            f.zimmermann Fabian Zimmermann
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment