Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.16.3
    • Fix Version/s: 0.18.0
    • Component/s: util
    • Labels:
      None
    • Release Note:
      Introduced an FTPFileSystem backed by Apache Commons FTPClient to directly store data into HDFS.

      Description

      An FTP client that stores content directly into HDFS allows data from FTP serves to be stored directly into HDFS instead of first copying the data locally and then uploading it into HDFS. The benefits are apparent from an administrative perspective as large datasets can be pulled from FTP servers with minimal human intervention.

      1. FTPJars.tar.gz
        936 kB
        Ankur
      2. Hadoop-3246-latest.patch
        41 kB
        Ankur
      3. HADOOP-3246.patch
        38 kB
        Doug Cutting

        Activity

        Hide
        Hudson added a comment -
        Show
        Hudson added a comment - Integrated in Hadoop-trunk #509 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/509/ )
        Hide
        Doug Cutting added a comment -

        We want to keep these out of lib/, since they're not a released binary that we want Hadoop users to use. We include them in svn for testing only.

        Show
        Doug Cutting added a comment - We want to keep these out of lib/, since they're not a released binary that we want Hadoop users to use. We include them in svn for testing only.
        Hide
        Tsz Wo Nicholas Sze added a comment -

        This issue introduces a few jar files for testing. The location of them is src/test/lib, i.e under src. In order to compile the tests in my eclipse project, I have to add them to the build path manually. Usually, jar files can be found in lib. So it is not obvious to find these jar files. I cannot find them until I read the patch in this issue. Could we move these jar files to lib?

        Show
        Tsz Wo Nicholas Sze added a comment - This issue introduces a few jar files for testing. The location of them is src/test/lib, i.e under src . In order to compile the tests in my eclipse project, I have to add them to the build path manually. Usually, jar files can be found in lib . So it is not obvious to find these jar files. I cannot find them until I read the patch in this issue. Could we move these jar files to lib?
        Hide
        Ankur added a comment -

        Attaching new required Jar files created after compiling Mina FTP server jars with Java 1.5.

        Show
        Ankur added a comment - Attaching new required Jar files created after compiling Mina FTP server jars with Java 1.5.
        Hide
        Doug Cutting added a comment -

        I just committed this. Thanks, Ankur!

        Show
        Doug Cutting added a comment - I just committed this. Thanks, Ankur!
        Hide
        Ankur added a comment -
        • Added FtpTests() that has the code for running FTPFileSystem test.
        • Modified TestFTPFileSystem to check for FTPServer class and conditionally execute the test.

        Hope the patch is now in good shape to get committed

        Show
        Ankur added a comment - Added FtpTests() that has the code for running FTPFileSystem test. Modified TestFTPFileSystem to check for FTPServer class and conditionally execute the test. Hope the patch is now in good shape to get committed
        Hide
        Doug Cutting added a comment -

        > I am just thinking if whether in the testFtp() method we should check for just one or all of the classes

        I think one is enough. If all jars are present, as from subversion, tests will run correctly. If no jars are present, as in a release, tests will be skipped. If only some jars are present, an unsupported configuration, tests will crash. That seems fine to me.

        Show
        Doug Cutting added a comment - > I am just thinking if whether in the testFtp() method we should check for just one or all of the classes I think one is enough. If all jars are present, as from subversion, tests will run correctly. If no jars are present, as in a release, tests will be skipped. If only some jars are present, an unsupported configuration, tests will crash. That seems fine to me.
        Hide
        Ankur added a comment -

        That will work. I am just thinking if whether in the testFtp() method we should check for just one or all of the classes from FTPServer used in the test since they come from different FTPServer jars ?

        Show
        Ankur added a comment - That will work. I am just thinking if whether in the testFtp() method we should check for just one or all of the classes from FTPServer used in the test since they come from different FTPServer jars ?
        Hide
        Doug Cutting added a comment -

        I think it might be simpler to break the test into two classes: TestFtpFileSystem, which will be run by our ant unit tests, and FtpTests, which actually performs the tests. The former can have a single method, testFtp(), that tries to load the FtpServer class. If it can, it constructs an instance of FtpTests and runs it. If it can't it just prints a warning. Might that work?

        Show
        Doug Cutting added a comment - I think it might be simpler to break the test into two classes: TestFtpFileSystem, which will be run by our ant unit tests, and FtpTests, which actually performs the tests. The former can have a single method, testFtp(), that tries to load the FtpServer class. If it can, it constructs an instance of FtpTests and runs it. If it can't it just prints a warning. Might that work?
        Hide
        Ankur added a comment -

        I assume that we would be compiling ftp unit tests with Mina jars and omitting them while release bundling.

        In that case we would have to load the ftpServer classes explicitly ourselves in ftp unit test and skip the test if classes are not found. However it is a little cumbersome since there are around 7 ftpServer and mina classes referenced.

        Does this fix sound reasonable ?

        Show
        Ankur added a comment - I assume that we would be compiling ftp unit tests with Mina jars and omitting them while release bundling. In that case we would have to load the ftpServer classes explicitly ourselves in ftp unit test and skip the test if classes are not found. However it is a little cumbersome since there are around 7 ftpServer and mina classes referenced. Does this fix sound reasonable ?
        Hide
        Doug Cutting added a comment -

        I think we can include this in 0.18. We won't distributed the ftpserver jars in the release package.

        It would be nice is to make it so that, if you unpack a release and run unit tests, they pass. So we should probably skip the ftp unit tests when the ftpserver jars are not present.

        Show
        Doug Cutting added a comment - I think we can include this in 0.18. We won't distributed the ftpserver jars in the release package. It would be nice is to make it so that, if you unpack a release and run unit tests, they pass. So we should probably skip the ftp unit tests when the ftpserver jars are not present.
        Hide
        Ankur added a comment -

        I definitely agree on this. Has this patch been scheduled for commit? I wonder what release is going to have this because otherwise we'll have to use an existing release and apply this patch exclusively for our deployment.

        Show
        Ankur added a comment - I definitely agree on this. Has this patch been scheduled for commit? I wonder what release is going to have this because otherwise we'll have to use an existing release and apply this patch exclusively for our deployment.
        Hide
        Doug Cutting added a comment -

        The problem with the ftpserver jars is that they're not yet a released Apache product. Providing them as a part of a Hadoop release means that the Hadoop project would be releasing them, and we must perform the diligence required of Apache releases. This is not something we should do lightly.

        So it will be much easier to provide HADOOP-3199 in Hadoop once there has been a release of ftpserver.

        Show
        Doug Cutting added a comment - The problem with the ftpserver jars is that they're not yet a released Apache product. Providing them as a part of a Hadoop release means that the Hadoop project would be releasing them, and we must perform the diligence required of Apache releases. This is not something we should do lightly. So it will be much easier to provide HADOOP-3199 in Hadoop once there has been a release of ftpserver.
        Hide
        Ankur added a comment -
        • Added License for SLF4J in the latest patch.
        • Packed the required jars in Hadoop-3246-Required-Jars.tar.gz. For testing this patch the file needs to be uncompressed in $HADOOP_HOME.

        Note that moving Mina and ftp jars to src/test/lib is fine for this issue. But when Hadoop-3199 becomes available, these jars would be required under lib/. So until then we can keep the Mina and ftp jars under src/test/lib

        Show
        Ankur added a comment - Added License for SLF4J in the latest patch. Packed the required jars in Hadoop-3246-Required-Jars.tar.gz. For testing this patch the file needs to be uncompressed in $HADOOP_HOME. Note that moving Mina and ftp jars to src/test/lib is fine for this issue. But when Hadoop-3199 becomes available, these jars would be required under lib/. So until then we can keep the Mina and ftp jars under src/test/lib
        Hide
        Doug Cutting added a comment -

        This is looking great!

        Stuff only used for testing should only be on the test classpath. I moved the ftp and mina jar files to src/test/lib and the test username & password to src/test/conf. Do these changes look reasonable?

        It would be simpler for folks to evaluate this patch if the jars were all in a single .tar.gz that, when unpacked in a Hadoop tree, put them in the right places. And any .jar that's not an Apache product should be accompanied by a .LICENSE.txt file containing the license that .jar is under.

        Show
        Doug Cutting added a comment - This is looking great! Stuff only used for testing should only be on the test classpath. I moved the ftp and mina jar files to src/test/lib and the test username & password to src/test/conf. Do these changes look reasonable? It would be simpler for folks to evaluate this patch if the jars were all in a single .tar.gz that, when unpacked in a Hadoop tree, put them in the right places. And any .jar that's not an Apache product should be accompanied by a .LICENSE.txt file containing the license that .jar is under.
        Hide
        Ankur added a comment -

        This is rather important piece of functionality allowing data-mobility between hadoop cluster and different FTP servers.
        Thus increasing the priority hoping it gets scheduled for commit sooner.

        Show
        Ankur added a comment - This is rather important piece of functionality allowing data-mobility between hadoop cluster and different FTP servers. Thus increasing the priority hoping it gets scheduled for commit sooner.
        Hide
        Ankur added a comment -

        Doug, is there anything else in this patch that can be improved upon ?

        Show
        Ankur added a comment - Doug, is there anything else in this patch that can be improved upon ?
        Hide
        Ankur added a comment -

        Looks like I got the fix committed to the Mina code line faster than I expected . So I have modified the Test case to set port as 0 so that OS can assign a free port. Also the testcase has been modified to fetch the port correctly from listener.

        I am attaching the latest Jars and updated patch.

        Show
        Ankur added a comment - Looks like I got the fix committed to the Mina code line faster than I expected . So I have modified the Test case to set port as 0 so that OS can assign a free port. Also the testcase has been modified to fetch the port correctly from listener. I am attaching the latest Jars and updated patch.
        Hide
        Ankur added a comment -

        Trying to connect to a randomly generated port might result in getting connected to a a different service running on the random port causing confusion in the commons FTPClient code and in turn FTPFileSystem code.

        As a simple fix I added the following line after bind() call in the MinaListener.start() method.

        setPort(acceptor.getLocalAddress().getPort());

        This sets the port correctly to the actual port the FTP server ended up listening on. As a result we could do simple thing in our Test case like

        MinaListener listener = (MinaListener) server.getServerContext().getListener("default");
        int serverPort = listener.getPort();
        ftpFs = FileSystem.get(URI.create("ftp://admin:admin@localhost:" + serverPort), conf);

        This works and I have tested it on my local machine.
        I have created a JIRA issue for Mina - https://issues.apache.org/jira/browse/FTPSERVER-134.

        If acceptable then I can provide the updated ftpserver-core.jar with this fix till the time it gets pushed into their code line.

        Show
        Ankur added a comment - Trying to connect to a randomly generated port might result in getting connected to a a different service running on the random port causing confusion in the commons FTPClient code and in turn FTPFileSystem code. As a simple fix I added the following line after bind() call in the MinaListener.start() method. setPort(acceptor.getLocalAddress().getPort()); This sets the port correctly to the actual port the FTP server ended up listening on. As a result we could do simple thing in our Test case like MinaListener listener = (MinaListener) server.getServerContext().getListener("default"); int serverPort = listener.getPort(); ftpFs = FileSystem.get(URI.create("ftp://admin:admin@localhost:" + serverPort), conf); This works and I have tested it on my local machine. I have created a JIRA issue for Mina - https://issues.apache.org/jira/browse/FTPSERVER-134 . If acceptable then I can provide the updated ftpserver-core.jar with this fix till the time it gets pushed into their code line.
        Hide
        Ankur added a comment -

        Is there an issue trying to run the server on 2021 ? I understand its less than ideal but still better to have a port defined for our test service instead of random connection attempts trying to figure out which port actually Mina launched on.

        Show
        Ankur added a comment - Is there an issue trying to run the server on 2021 ? I understand its less than ideal but still better to have a port defined for our test service instead of random connection attempts trying to figure out which port actually Mina launched on.
        Hide
        Ankur added a comment -

        Jar files required for this patch

        Show
        Ankur added a comment - Jar files required for this patch
        Hide
        Doug Cutting added a comment -

        Thanks! This looks great!

        Can you please attach the required jar files to this issue too?

        Ideally the test service should specify zero as the port, so that the OS allocates a free port, then we can ask which port was allocated and use that. But it doesn't look as though the Mina FTP server permits one to get the port that it's actually launched on. So perhaps we should try a few times in a loop, using a randomly generated port? Otherwise we'll get random test failures when that ports already in use, e.g., by another test run.

        Show
        Doug Cutting added a comment - Thanks! This looks great! Can you please attach the required jar files to this issue too? Ideally the test service should specify zero as the port, so that the OS allocates a free port, then we can ask which port was allocated and use that. But it doesn't look as though the Mina FTP server permits one to get the port that it's actually launched on. So perhaps we should try a few times in a loop, using a randomly generated port? Otherwise we'll get random test failures when that ports already in use, e.g., by another test run.
        Hide
        Ankur added a comment -

        The new patch now has test case that runs standalone using Apache mina FTP server embedded in its code. As a result additional Jars from Apache Mina FTP server are required. Name of all the required Jars present in the patch.

        Show
        Ankur added a comment - The new patch now has test case that runs standalone using Apache mina FTP server embedded in its code. As a result additional Jars from Apache Mina FTP server are required. Name of all the required Jars present in the patch.
        Hide
        Ankur added a comment -

        Ok, here the latest patch that implements the suggested changes from Doug. Here are the results of testing on my machine.

        [exec] +1 overall.
        [exec]
        [exec] +1 @author. The patch does not contain any @author tags.
        [exec]
        [exec] +1 tests included. The patch appears to include 7 new or modified tests.
        [exec]
        [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
        [exec]
        [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
        [exec]
        [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.

        Show
        Ankur added a comment - Ok, here the latest patch that implements the suggested changes from Doug. Here are the results of testing on my machine. [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 7 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12381694/ftpFileSystem.patch
        against trunk revision 654315.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 7 new or modified tests.

        -1 javadoc. The javadoc tool appears to have generated 1 warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        -1 findbugs. The patch appears to cause Findbugs to fail.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2431/testReport/
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2431/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2431/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12381694/ftpFileSystem.patch against trunk revision 654315. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 7 new or modified tests. -1 javadoc. The javadoc tool appears to have generated 1 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to cause Findbugs to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2431/testReport/ Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2431/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2431/console This message is automatically generated.
        Hide
        Doug Cutting added a comment -

        Overall this looks very good! A few minor suggestions:

        • We should accept the username and password in the uri, as ftp://user:pass@host/, with the password optional.
        • It would be nice if folks could specify different usernames and passwords for different hosts in their configuration, perhaps with properties like ftp.user.host.example.com and ftp.password.host.example.com.
        • Rather than keeping a connection open in the FileSystem instance, perhaps we should open and close new connections for each file read, written, renamed, etc? FileSystem.java caches FileSystem implementations forever, and an FTP connection might time out. Also, the working-directory state of the connection makes this not thread-safe, which a connection per request would fix.
        • It would be best if the unit tests ran standalone, without requiring an external FTP server. We might include the Mina FTP server just for testing? We could put the jars somewhere in src/test.
        Show
        Doug Cutting added a comment - Overall this looks very good! A few minor suggestions: We should accept the username and password in the uri, as ftp://user:pass@host/ , with the password optional. It would be nice if folks could specify different usernames and passwords for different hosts in their configuration, perhaps with properties like ftp.user.host.example.com and ftp.password.host.example.com. Rather than keeping a connection open in the FileSystem instance, perhaps we should open and close new connections for each file read, written, renamed, etc? FileSystem.java caches FileSystem implementations forever, and an FTP connection might time out. Also, the working-directory state of the connection makes this not thread-safe, which a connection per request would fix. It would be best if the unit tests ran standalone, without requiring an external FTP server. We might include the Mina FTP server just for testing? We could put the jars somewhere in src/test.
        Hide
        Ankur added a comment -

        The latest patch that implements the FTP client as an FTPFileSystem as suggested. Here are the results of testing the patch on my local machine (Since automated testing is going to fail anyway )

        +1 overall.
        [exec]
        [exec] +1 @author. The patch does not contain any @author tags.
        [exec]
        [exec] +1 tests included. The patch appears to include 7 new or modified tests.
        [exec]
        [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
        [exec]
        [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
        [exec]
        [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.

        Show
        Ankur added a comment - The latest patch that implements the FTP client as an FTPFileSystem as suggested. Here are the results of testing the patch on my local machine (Since automated testing is going to fail anyway ) +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 7 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
        Hide
        Ankur added a comment -

        Chris, Thanks for the comments, i'll submit the new patch soon with suggested modifications

        > It looks like some testing code accidentally
        Sorry! will take better care next time.

        > If you wanted to keep some of the code ...
        I guess that won't be necessary as existing tools would be sufficient once this is implemented as an FTP Filesystem.

        Show
        Ankur added a comment - Chris, Thanks for the comments, i'll submit the new patch soon with suggested modifications > It looks like some testing code accidentally Sorry! will take better care next time. > If you wanted to keep some of the code ... I guess that won't be necessary as existing tools would be sufficient once this is implemented as an FTP Filesystem.
        Hide
        Chris Douglas added a comment -

        A few points:

        • +1 for this being a FileSystem
        • Changing javac.version in build.xml from 1.5 to 1.6 should not be part of this patch
        • ftp.server.host, ftp.server.username, and ftp.server.password should be o.a.h.conf.Configuration properties, not system properties. Conditionally running the unit tests based on a switch in build.xml is also not consistent with the existing unit tests; all the changes to build.xml should probably be reverted.
        • ftpClient.sh doesn't seem necessary; it certainly won't be if this is a FileSystem
        • It looks like some testing code accidentally made it into the patch, in FtpShell:
          +    conf.set("fs.default.name", "hdfs://agoel-pc:9000");
          +    conf.set("fs.hdfs.impl", "org.apache.hadoop.dfs.DistributedFileSystem");
          +    conf.set("hadoop.tmp.dir", "/tmp/hadoop-aankurgoel");
          
        • Process trivia: CHANGES.txt is added by the committer when the patch goes in. The "Release Notes" field in JIRA is filled out by the contributor
        • FtpClient::EraserThread is a heavyweight way to hide the password. Unfortunately, java.io.Console::readPassword is only in Java 1.6, and it disables echo via a (platform-dependent) native call, so it's not clear how one would do this in Java 1.5. Leaving this as a configuration property is probably sufficient until we can move to Java 1.6; as a FileSystem, it needs to be a Configuration property, anyway.
        • If you wanted to keep some of the code for a FTP client separate from existing tools, take a look at o.a.h.util.ToolBase to pick up some of the generic option parsing common to most hadoop utilities
        Show
        Chris Douglas added a comment - A few points: +1 for this being a FileSystem Changing javac.version in build.xml from 1.5 to 1.6 should not be part of this patch ftp.server.host, ftp.server.username, and ftp.server.password should be o.a.h.conf.Configuration properties, not system properties. Conditionally running the unit tests based on a switch in build.xml is also not consistent with the existing unit tests; all the changes to build.xml should probably be reverted. ftpClient.sh doesn't seem necessary; it certainly won't be if this is a FileSystem It looks like some testing code accidentally made it into the patch, in FtpShell: + conf.set("fs.default.name", "hdfs://agoel-pc:9000"); + conf.set("fs.hdfs.impl", "org.apache.hadoop.dfs.DistributedFileSystem"); + conf.set("hadoop.tmp.dir", "/tmp/hadoop-aankurgoel"); Process trivia: CHANGES.txt is added by the committer when the patch goes in. The "Release Notes" field in JIRA is filled out by the contributor FtpClient::EraserThread is a heavyweight way to hide the password. Unfortunately, java.io.Console::readPassword is only in Java 1.6, and it disables echo via a (platform-dependent) native call, so it's not clear how one would do this in Java 1.5. Leaving this as a configuration property is probably sufficient until we can move to Java 1.6; as a FileSystem, it needs to be a Configuration property, anyway. If you wanted to keep some of the code for a FTP client separate from existing tools, take a look at o.a.h.util.ToolBase to pick up some of the generic option parsing common to most hadoop utilities
        Hide
        Doug Cutting added a comment -

        > Wouldn't this be done better as a FileSystem?

        +1. That makes sense to me. If the goal is to load data from FTP servers into HDFS, or vice-versa, then a FTP FileSystem implementation would permit both using existing tools like 'bin/hadoop fs' and distcp.

        Show
        Doug Cutting added a comment - > Wouldn't this be done better as a FileSystem? +1. That makes sense to me. If the goal is to load data from FTP servers into HDFS, or vice-versa, then a FTP FileSystem implementation would permit both using existing tools like 'bin/hadoop fs' and distcp.
        Hide
        Owen O'Malley added a comment -

        Wouldn't this be done better as a FileSystem? Then you could use distcp to copy to or from a ftp server.

        Show
        Owen O'Malley added a comment - Wouldn't this be done better as a FileSystem? Then you could use distcp to copy to or from a ftp server.
        Hide
        Ankur added a comment -

        So I tested the patch locally after adding the JARS to the test env manually and disabling the modification check in script 'test-patch.sh'.
        Here's the report on my machine

        -1 overall.
        [exec]
        [exec] @author +1. The patch does not contain any @author tags.
        [exec]
        [exec] tests included +1. The patch appears to include 9 new or modified tests.
        [exec]
        [exec] javadoc +1. The javadoc tool did not generate any warning messages.
        [exec]
        [exec] javac +1. The applied patch does not generate any new javac compiler warnings.
        [exec]
        [exec] findbugs -1. The patch appears to introduce 1 new Findbugs warnings.

        The find bugs warning is due to a couple of System.exit() in the code which even though is bad practice but still acceptable in my opinion for a shell application like this. Still I can go ahead and change this once I get some review comments.

        Show
        Ankur added a comment - So I tested the patch locally after adding the JARS to the test env manually and disabling the modification check in script 'test-patch.sh'. Here's the report on my machine -1 overall. [exec] [exec] @author +1. The patch does not contain any @author tags. [exec] [exec] tests included +1. The patch appears to include 9 new or modified tests. [exec] [exec] javadoc +1. The javadoc tool did not generate any warning messages. [exec] [exec] javac +1. The applied patch does not generate any new javac compiler warnings. [exec] [exec] findbugs -1. The patch appears to introduce 1 new Findbugs warnings. The find bugs warning is due to a couple of System.exit() in the code which even though is bad practice but still acceptable in my opinion for a shell application like this. Still I can go ahead and change this once I get some review comments.
        Hide
        Ankur added a comment -

        Javadoc, findbugs, unit-test errors will go once the addtional required JARs (commons-net-1.4.1.jar and oro-2.0.8.jar) are included in the hadoop lib directory.
        I am assuming that this does not happen in automated patch testing as the same errors surfaced in my local patch testing following the patch-test instructions mentioned on wiki.

        I tried adding the JARS manually to the clean environment but then the testing would complain saying 'Cannot test patch in a modified environment'

        Any suggestions on workaround for local patch testing ?

        Show
        Ankur added a comment - Javadoc, findbugs, unit-test errors will go once the addtional required JARs (commons-net-1.4.1.jar and oro-2.0.8.jar) are included in the hadoop lib directory. I am assuming that this does not happen in automated patch testing as the same errors surfaced in my local patch testing following the patch-test instructions mentioned on wiki. I tried adding the JARS manually to the clean environment but then the testing would complain saying 'Cannot test patch in a modified environment' Any suggestions on workaround for local patch testing ?
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12380604/ftpClient.patch
        against trunk revision 645773.

        @author +1. The patch does not contain any @author tags.

        tests included +1. The patch appears to include 9 new or modified tests.

        javadoc -1. The javadoc tool appears to have generated 1 warning messages.

        javac +1. The applied patch does not generate any new javac compiler warnings.

        release audit +1. The applied patch does not generate any new release audit warnings.

        findbugs -1. The patch appears to cause Findbugs to fail.

        core tests -1. The patch failed core unit tests.

        contrib tests -1. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2294/testReport/
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2294/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2294/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12380604/ftpClient.patch against trunk revision 645773. @author +1. The patch does not contain any @author tags. tests included +1. The patch appears to include 9 new or modified tests. javadoc -1. The javadoc tool appears to have generated 1 warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs -1. The patch appears to cause Findbugs to fail. core tests -1. The patch failed core unit tests. contrib tests -1. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2294/testReport/ Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2294/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2294/console This message is automatically generated.
        Hide
        Ankur added a comment -

        The patch implements the ftpclient as a standalone shell. Most basic commands are implemented
        get, put, delete, ls, cd, pwd , lls (DFS), lpwd (DFS), lcd (DFS).

        Important thing to note is that get, put and delete are implemented so that they work on multiple files so no separate mget or mput is required.

        Future work:
        ==========
        Suggestions are welcome on

        1. More commands that we would like to include in ftpclient
        2. Multi-threaded I/O for faster get, put
        3. Help improvement
        4. Any other feature/modification

        Show
        Ankur added a comment - The patch implements the ftpclient as a standalone shell. Most basic commands are implemented get, put, delete, ls, cd, pwd , lls (DFS), lpwd (DFS), lcd (DFS). Important thing to note is that get, put and delete are implemented so that they work on multiple files so no separate mget or mput is required. Future work: ========== Suggestions are welcome on 1. More commands that we would like to include in ftpclient 2. Multi-threaded I/O for faster get, put 3. Help improvement 4. Any other feature/modification
        Hide
        Ankur added a comment -

        For the patch to work, following additional JARs are required

        • commons-net-1.4.1.jar
        • oro-2.0.8.jar

        These can be downloaded from
        http://download.nextag.com/apache/commons/net/binaries/commons-net-1.4.1.tar.gz

        Show
        Ankur added a comment - For the patch to work, following additional JARs are required commons-net-1.4.1.jar oro-2.0.8.jar These can be downloaded from http://download.nextag.com/apache/commons/net/binaries/commons-net-1.4.1.tar.gz
        Hide
        Ankur added a comment -

        I added a couple of JAR files in my patch (commons-net-1.4.1.jar and oro-2.0.8.jar) but svn diff would'nt add the jar contents to the patch correctly. all it says in the patch is 'Cannot display the contents of the binary file'. Consequently when I try to test the patch on my local machine it fails as it is unable to add the required binary files.

        I am using Collabnet's SVN command line client - 1.4.6 for Red hat linux.

        Can someone suggest a workaround as this is preventing me from submitting the patch

        Thanks

        Show
        Ankur added a comment - I added a couple of JAR files in my patch (commons-net-1.4.1.jar and oro-2.0.8.jar) but svn diff would'nt add the jar contents to the patch correctly. all it says in the patch is 'Cannot display the contents of the binary file'. Consequently when I try to test the patch on my local machine it fails as it is unable to add the required binary files. I am using Collabnet's SVN command line client - 1.4.6 for Red hat linux. Can someone suggest a workaround as this is preventing me from submitting the patch Thanks
        Hide
        Ankur added a comment -

        > To be clear...

        Yes these are existing FTP servers not HDFS systems

        Show
        Ankur added a comment - > To be clear... Yes these are existing FTP servers not HDFS systems
        Hide
        Doug Cutting added a comment -

        > At present we are faced with the issue of our data lying in different remote FTP server locations.

        To be clear: these are existing FTP servers, not remote HDFS systems that you wish to access over FTP? Because, if you want to transfer data from a remote HDFS system, then the HFTP scheme permits this over HTTP or HTTPS.

        Show
        Doug Cutting added a comment - > At present we are faced with the issue of our data lying in different remote FTP server locations. To be clear: these are existing FTP servers, not remote HDFS systems that you wish to access over FTP? Because, if you want to transfer data from a remote HDFS system, then the HFTP scheme permits this over HTTP or HTTPS.
        Hide
        Enis Soztutar added a comment -

        oops, I've missed that this issue will track and FTP client. Reopening the issue.

        Show
        Enis Soztutar added a comment - oops, I've missed that this issue will track and FTP client. Reopening the issue.
        Hide
        Ankur added a comment -

        Even though the two issues look same, they are different.

        HADOOP-3199 is about an FTP server that provides FTP access to data in HDFS. Any FTP client would then be able to access HDFS data through FTP.

        This issue is about an FTP client talks to remote FTP server(s), pull data from them and store directly into HDFS.

        At present we are faced with the issue of our data lying in different remote FTP server locations. Pulling a lot of data from different locations is a lot of manual work including fetching data over FTP, storing it locally and then putting it into HDFS. This is cumbersome especially if the data is too large to fit into local storage.

        This utility essentially provides following benefits
        1. The steps of 'pull data from FTP server', 'store locally', 'tranfer to HDFS' and 'delete local copy' are converted into 1 step - 'Pull data and store into HDFS' .
        2. No need to worry about lack of local storage as data goes directly into HDFS.
        3. Can be used to run a batch of commands that include pulling data from different FTP servers.

        All of this greatly simplifies administrative tasks.

        +1 for marking this as 'Not Duplicate'

        Show
        Ankur added a comment - Even though the two issues look same, they are different. HADOOP-3199 is about an FTP server that provides FTP access to data in HDFS. Any FTP client would then be able to access HDFS data through FTP. This issue is about an FTP client talks to remote FTP server(s), pull data from them and store directly into HDFS. At present we are faced with the issue of our data lying in different remote FTP server locations. Pulling a lot of data from different locations is a lot of manual work including fetching data over FTP, storing it locally and then putting it into HDFS. This is cumbersome especially if the data is too large to fit into local storage. This utility essentially provides following benefits 1. The steps of 'pull data from FTP server', 'store locally', 'tranfer to HDFS' and 'delete local copy' are converted into 1 step - 'Pull data and store into HDFS' . 2. No need to worry about lack of local storage as data goes directly into HDFS. 3. Can be used to run a batch of commands that include pulling data from different FTP servers. All of this greatly simplifies administrative tasks. +1 for marking this as 'Not Duplicate'
        Hide
        Enis Soztutar added a comment -

        duplicate of HADOOP-3199

        Show
        Enis Soztutar added a comment - duplicate of HADOOP-3199

          People

          • Assignee:
            Ankur
            Reporter:
            Ankur
          • Votes:
            1 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development