Hadoop Common
  1. Hadoop Common
  2. HADOOP-4864

-libjars with multiple jars broken when client and cluster reside on different OSs

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.19.0
    • Fix Version/s: 0.21.0
    • Component/s: filecache
    • Labels:
      None
    • Environment:

      When your hadoop job spans OSs.

    • Hadoop Flags:
      Reviewed

      Description

      When submitting a hadoop job from Windows (Cygwin) to a Linux hadoop cluster (or vice versa), and when you specify multiple additional jar files via the -libjars flag, hadoop throws a ClassNotFoundException for any classes located in the additional jars specified via the -libjars flag.

      This is caused by the fact that hadoop uses system.getProperty("path.separator") as the delimiter in the list of jar files passed via -libjars.

      My suggested solution is to use a comma as the delimiter, rather than the path.separator.

      I realize comma is, perhaps, a poor choice for a delimiter because it is valid in filenames on both Windows and Linux, but the -libjars flag uses it as the delimiter when listing the additional required jars. So, I figured if it's already being used as a delimiter, then it's reasonable to use it internally as well.

      1. patch.txt
        1 kB
        Stuart White
      2. patch-4864.txt
        2 kB
        Amareshwari Sriramadasu
      3. patch-4864-1.txt
        3 kB
        Amareshwari Sriramadasu

        Activity

        Stuart White created issue -
        Hide
        Stuart White added a comment -

        Patch that changes Hadoop's internal delimiter for list of jars specified via -libjars from using System.getProperty("path.separator") to using a comma.

        This is because path.separator is platform-specific and therefore does not serve as an appropriate delimiter across platforms.

        Show
        Stuart White added a comment - Patch that changes Hadoop's internal delimiter for list of jars specified via -libjars from using System.getProperty("path.separator") to using a comma. This is because path.separator is platform-specific and therefore does not serve as an appropriate delimiter across platforms.
        Stuart White made changes -
        Field Original Value New Value
        Attachment patch.txt [ 12395992 ]
        Stuart White made changes -
        Description When submitting a hadoop job from Windows (Cygwin) to a Linux hadoop cluster (or vice versa), and when you specify multiple additional jar files via the -libjars flag, hadoop throws a ClassNotFoundException for any classes located in the additional jars specified via the -libjars flag.

        This is caused by the fact that hadoop uses system.getProperty("path.separator") as the delimiter in the list of jar files passed via -libjars.

        If your job spans platforms, system.getProperty("path.separator") returns a different delimiter on the different platforms.

        My suggested solution is to use a comma as the delimiter, rather than the path.separator.

        I realize comma is, perhaps, a poor choice for a delimiter because it is valid in filenames on both Windows and Linux, but the -libjars flag uses it as the delimiter when listing the additional required jars. So, I figured if it's already being used as a delimiter, then it's reasonable to use it internally as well.

        I have a patch that applied my suggested change, but I don't see anywhere so upload it. So, I'll go ahead and create this JIRA and hope that I will have the opportunity to add a patch later.

        Now, with this change, I can submit hadoop jobs (requiring multiple
        supporting jars) from my Windows laptop (via cygwin) to my 10-node
        Linux hadoop cluster.

        Any chance this change could be applied to the hadoop codebase?

        To recreate the problem I'm seeing, do the following:

        - Setup a hadoop cluster on linux

        - Perform the remaining steps on cygwin, with a hadoop installation
        configured to point to the linux cluster. (set fs.default.name and
        mapred.job.tracker)

        - Extract the tarball. Change into created directory.
         tar xvfz Example.tar.gz
         cd Example

        - Edit build.properties, set your hadoop.home appropriately, then
        build the example.
         ant

        - Load the file Example.in into your dfs
         hadoop dfs -copyFromLocal Example.in Example.in

        - Execute the provided shell script, passing it testID 1.
         ./Example.sh 1
         This test does not use -libjars, and it completes successfully.

        - Next, execute testID 2.
         ./Example.sh 2
         This test uses -libjars with 1 jarfile (Foo.jar), and it completes
        successfully.

        - Next, execute testID 3.
         ./Example.sh 3
         This test uses -libjars with 1 jarfile (Bar.jar), and it completes
        successfully.

        - Next, execute testID 4.
         ./Example.sh 4
         This test uses -libjars with 2 jarfiles (Foo.jar and Bar.jar), and
        it fails with a ClassNotFoundException.

        This behavior only occurs when calling from cygwin to linux or vice
        versa. If both the cluster and the client reside on either linux or
        cygwin, the problem does not occur.

        I'm continuing to dig to see what I can figure out, but since I'm very
        new to hadoop (started using it this week), I thought I'd go ahead and
        throw this out there to see if anyone can help.

        Thanks!
        When submitting a hadoop job from Windows (Cygwin) to a Linux hadoop cluster (or vice versa), and when you specify multiple additional jar files via the -libjars flag, hadoop throws a ClassNotFoundException for any classes located in the additional jars specified via the -libjars flag.

        This is caused by the fact that hadoop uses system.getProperty("path.separator") as the delimiter in the list of jar files passed via -libjars.

        My suggested solution is to use a comma as the delimiter, rather than the path.separator.

        I realize comma is, perhaps, a poor choice for a delimiter because it is valid in filenames on both Windows and Linux, but the -libjars flag uses it as the delimiter when listing the additional required jars. So, I figured if it's already being used as a delimiter, then it's reasonable to use it internally as well.
        Stuart White made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12395992/patch.txt
        against trunk revision 764031.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no tests are needed for this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/181/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/181/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/181/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/181/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12395992/patch.txt against trunk revision 764031. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/181/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/181/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/181/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/181/console This message is automatically generated.
        Hide
        Amareshwari Sriramadasu added a comment -

        The same changes will be required in addFileToClassPath and getFileClassPaths methods also.

        Show
        Amareshwari Sriramadasu added a comment - The same changes will be required in addFileToClassPath and getFileClassPaths methods also.
        Hide
        Amareshwari Sriramadasu added a comment -

        Patch changing addFileToClassPath and getFileClassPaths methods also to use comma instead of path.separator

        Show
        Amareshwari Sriramadasu added a comment - Patch changing addFileToClassPath and getFileClassPaths methods also to use comma instead of path.separator
        Amareshwari Sriramadasu made changes -
        Attachment patch-4864.txt [ 12408552 ]
        Amareshwari Sriramadasu made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Hide
        Amareshwari Sriramadasu added a comment -

        Both testpatch and ant test passed on my machine.

        Show
        Amareshwari Sriramadasu added a comment - Both testpatch and ant test passed on my machine.
        Amareshwari Sriramadasu made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Assignee Amareshwari Sriramadasu [ amareshwari ]
        Fix Version/s 0.21.0 [ 12313563 ]
        Hide
        Amareshwari Sriramadasu added a comment -

        can somebody please review this?

        Show
        Amareshwari Sriramadasu added a comment - can somebody please review this?
        Hide
        Todd Lipcon added a comment -

        I agree in spirit, but while you're changing that code you may as well use conf.setStrings and conf.getStringCollection to deal with the comma-delimiting, etc. This way, if those functions eventually get magical comma-escaping ability, you'll benefit for free.

        Show
        Todd Lipcon added a comment - I agree in spirit, but while you're changing that code you may as well use conf.setStrings and conf.getStringCollection to deal with the comma-delimiting, etc. This way, if those functions eventually get magical comma-escaping ability, you'll benefit for free.
        Hide
        Amareshwari Sriramadasu added a comment -

        I changed getFileClassPaths and getArchiveClassPaths to use getStringCollection. Did not change addFileToClassPath and addArchiveToClassPath, since they add more files to existing configuration.

        Show
        Amareshwari Sriramadasu added a comment - I changed getFileClassPaths and getArchiveClassPaths to use getStringCollection. Did not change addFileToClassPath and addArchiveToClassPath, since they add more files to existing configuration.
        Amareshwari Sriramadasu made changes -
        Attachment patch-4864-1.txt [ 12408681 ]
        Hide
        Todd Lipcon added a comment -

        How about using something like:

        Collection<String> paths = conf.getStringCollection(key);
        paths.add(toAppend);
        conf.setStrings(key, paths.toArray());
        
        Show
        Todd Lipcon added a comment - How about using something like: Collection< String > paths = conf.getStringCollection(key); paths.add(toAppend); conf.setStrings(key, paths.toArray());
        Hide
        Amareshwari Sriramadasu added a comment -

        Todd, we can do that. but I don't want to do the same, since other methods addCacheFile etc also have similar code. The patch is in consistent with current code. If you insist, this change can be done for all methods by a separate jira.

        Show
        Amareshwari Sriramadasu added a comment - Todd, we can do that. but I don't want to do the same, since other methods addCacheFile etc also have similar code. The patch is in consistent with current code. If you insist, this change can be done for all methods by a separate jira.
        Hide
        Todd Lipcon added a comment -

        Fair enough - I don't feel strongly about it, just figured it would be good for consistency.

        Show
        Todd Lipcon added a comment - Fair enough - I don't feel strongly about it, just figured it would be good for consistency.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12408681/patch-4864-1.txt
        against trunk revision 777594.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no tests are needed for this patch.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/381/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/381/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/381/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/381/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12408681/patch-4864-1.txt against trunk revision 777594. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/381/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/381/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/381/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/381/console This message is automatically generated.
        Hide
        Todd Lipcon added a comment -

        The contrib tests that failed have been failing on all tests - unrelated

        Show
        Todd Lipcon added a comment - The contrib tests that failed have been failing on all tests - unrelated
        Hide
        Iyappan Srinivasan added a comment -

        I have tested with patch. Hadoop command with -libjars, two files seperate by comma, without a double quote on the -libjars parameters and it passes with this patch. I ran a sleep command and a streaming command.

        I have ran the same wihout the patch and it fails at the map phase saying classNotFoundException.

        Show
        Iyappan Srinivasan added a comment - I have tested with patch. Hadoop command with -libjars, two files seperate by comma, without a double quote on the -libjars parameters and it passes with this patch. I ran a sleep command and a streaming command. I have ran the same wihout the patch and it fails at the map phase saying classNotFoundException.
        Hide
        Amareshwari Sriramadasu added a comment -

        Thanks Iyappan for testing this out.

        Show
        Amareshwari Sriramadasu added a comment - Thanks Iyappan for testing this out.
        Hide
        Devaraj Das added a comment -

        I just committed this. Thanks, Amareshwari!

        Show
        Devaraj Das added a comment - I just committed this. Thanks, Amareshwari!
        Devaraj Das made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Hadoop Flags [Reviewed]
        Resolution Fixed [ 1 ]
        Hide
        Amareshwari Sriramadasu added a comment -

        Can this be committed to branch-0.20? I verified that the same patch applies to branch 0.20 as well

        Show
        Amareshwari Sriramadasu added a comment - Can this be committed to branch-0.20? I verified that the same patch applies to branch 0.20 as well
        Hide
        Amareshwari Sriramadasu added a comment -

        The same patch applies to Yahoo! distribution branch 0.20 also.

        Show
        Amareshwari Sriramadasu added a comment - The same patch applies to Yahoo! distribution branch 0.20 also.
        Tom White made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Patch Available Patch Available Open Open
        39d 6h 57m 1 Amareshwari Sriramadasu 20/May/09 04:00
        Open Open Patch Available Patch Available
        118d 17h 27m 2 Amareshwari Sriramadasu 20/May/09 04:01
        Patch Available Patch Available Resolved Resolved
        8d 6h 43m 1 Devaraj Das 28/May/09 10:44
        Resolved Resolved Closed Closed
        453d 9h 49m 1 Tom White 24/Aug/10 20:34

          People

          • Assignee:
            Amareshwari Sriramadasu
            Reporter:
            Stuart White
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 1h
              1h
              Remaining:
              Remaining Estimate - 1h
              1h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development