Mahout
  1. Mahout
  2. MAHOUT-780

job jars fail on OS X due to case-insensitive name conflict on 'license'

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.5
    • Fix Version/s: 0.6
    • Component/s: build
    • Labels:
      None
    • Environment:

      Mac OS X

      Description

      Dan explains it well below. The workaround is to make the 'license' folder into a 'licenses' folder, but, where does this come from? anyone know?

      With SVN 'At revision 1152597.', and freshly rebuilt:

      jar -tvf /Users/danbri/Documents/workspace/trunk/examples/target/mahout-examples-0.6-SNAPSHOT-job.jar

      grep -i license

      19355 Sat Feb 26 19:16:30 CET 2011 META-INF/LICENSE.txt
      11358 Sun Apr 11 21:45:12 CEST 2010 META-INF/LICENSE
      1596 Mon Dec 20 15:47:30 CET 2010 LICENSE
      0 Sun Dec 01 11:57:24 CET 2002 license/
      4083 Sun Dec 01 11:57:24 CET 2002 license/LICENSE.dom-documentation.txt
      3595 Sun Dec 01 11:57:24 CET 2002 license/LICENSE.dom-software.txt
      804 Sun Dec 01 11:57:24 CET 2002 license/LICENSE.sax.txt
      2827 Sun Dec 01 11:57:24 CET 2002 license/LICENSE.txt
      1274 Sun Dec 01 11:57:24 CET 2002 license/README.dom.txt
      715 Sun Dec 01 11:57:24 CET 2002 license/README.sax.txt
      672 Sun Dec 01 11:57:24 CET 2002 license/README.txt

      This situation seems to quite confuse Hadoop. The underlying OSX
      filesystem doesn't support file and directory names differing only by
      case; see http://developer.apple.com/library/mac/#documentation/Java/Conceptual/Java14Development/01-JavaOverview/JavaOverview.html

      mahout lucene.vector --dir solr/data/index/ --output bar/vecs --field
      label --idField id --dictOut bar/dict.out --norm 2

      Running on hadoop, using HADOOP_HOME=/Users/danbri/working/hadoop/hadoop-0.20.2
      HADOOP_CONF_DIR=/Users/danbri/working/hadoop/hadoop-0.20.2/conf
      MAHOUT-JOB: /Users/danbri/Documents/workspace/trunk/examples/target/mahout-examples-0.6-SNAPSHOT-job.jar

      Exception in thread "main" java.io.IOException: Mkdirs failed to
      create /tmp/hadoop/hadoop-unjar5018665014541152120/license
      at org.apache.hadoop.util.RunJar.unJar(RunJar.java:48)
      at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

      That Hadoop error message is somewhat unhelpful, especially for those
      who doubt their hadoop knowhow; but technically correct. The
      /tmp/hadoop and its subdirectory exist and are writeable. The problem
      is the specific file/dir names being written into it. That wasn't so
      obvious. So I went chasing around configuring hadoop tmp dirs,
      checking it existed and was writable in local and in hdfs dirs, ...
      then ... I finally, belatedly tried unzipping the jar with 'jar -xvf '
      to see what was special about 'license', and got the same error from
      commandline 'jar' that upset !file.getParentFile().isDirectory() in
      Hadoop's ./src/core/org/apache/hadoop/util/RunJar.java:

      java.io.IOException: license : could not create directory
      at sun.tools.jar.Main.extractFile(Main.java:909)
      at sun.tools.jar.Main.extract(Main.java:852)
      at sun.tools.jar.Main.run(Main.java:242)
      at sun.tools.jar.Main.main(Main.java:1149)

      (this is the same error that trips up hadoop)

      This seems to be reproducible; I did an svn up, mvn clean and mvn
      package, let all the tests run and pass, and confirm that the same
      thing happens.

      I compared an early job .jar from 0.5, where all was fine. Any
      suggestions for best quick fix?

        Activity

        Hide
        Dan Brickley added a comment -

        I cleaned out ~/.m2/ and made a totally fresh checkout, to confirm it's reproducable here.

        While those tests re-run, I've unpacked .jar files from my old Maven .m2 tree, and matching on timestamps I see

        1596 Mon Dec 20 15:47:30 CET 2010 LICENSE

        seems to come from ./com/github/stephenc/high-scale-lib/high-scale-lib/1.1.2/high-scale-lib-1.1.2.jar

        and "0 Sun Dec 01 11:57:24 CET 2002 license/" from ./xml-apis/xml-apis/1.0.b2/xml-apis-1.0.b2.jar

        This doesn't explain how or why they get superimposed in the same filetree, though. Maven Assembly plugin? I searched around for others with similar problems, found nothing, which points at some local quirk in my setup. I wouldn't be surprised if cleaning .m2 fixed things.

        Show
        Dan Brickley added a comment - I cleaned out ~/.m2/ and made a totally fresh checkout, to confirm it's reproducable here. While those tests re-run, I've unpacked .jar files from my old Maven .m2 tree, and matching on timestamps I see 1596 Mon Dec 20 15:47:30 CET 2010 LICENSE seems to come from ./com/github/stephenc/high-scale-lib/high-scale-lib/1.1.2/high-scale-lib-1.1.2.jar and "0 Sun Dec 01 11:57:24 CET 2002 license/" from ./xml-apis/xml-apis/1.0.b2/xml-apis-1.0.b2.jar This doesn't explain how or why they get superimposed in the same filetree, though. Maven Assembly plugin? I searched around for others with similar problems, found nothing, which points at some local quirk in my setup. I wouldn't be surprised if cleaning .m2 fixed things.
        Hide
        Dan Brickley added a comment -

        Confirmed: I wiped out .m2/ and rebuilt a totally fresh checkout, ... have same issue as above still.

        jar -xvf [...]/trunk/examples/target/mahout-examples-0.6-SNAPSHOT-job.jar
        created: javax/xml/
        created: javax/xml/parsers/
        created: javax/xml/transform/
        created: javax/xml/transform/dom/
        created: javax/xml/transform/sax/
        created: javax/xml/transform/stream/
        java.io.IOException: license : could not create directory
        at sun.tools.jar.Main.extractFile(Main.java:909)
        at sun.tools.jar.Main.extract(Main.java:852)
        at sun.tools.jar.Main.run(Main.java:242)
        at sun.tools.jar.Main.main(Main.java:1149)

        jar -tvf [...]/trunk/examples/target/mahout-examples-0.6-SNAPSHOT-job.jar

        grep -i license
        19355 Sat Feb 26 19:16:30 CET 2011 META-INF/LICENSE.txt
        11358 Sun Apr 11 21:45:12 CEST 2010 META-INF/LICENSE
        1596 Mon Dec 20 15:47:30 CET 2010 LICENSE
        0 Sun Dec 01 11:57:24 CET 2002 license/
        4083 Sun Dec 01 11:57:24 CET 2002 license/LICENSE.dom-documentation.txt
        3595 Sun Dec 01 11:57:24 CET 2002 license/LICENSE.dom-software.txt
        804 Sun Dec 01 11:57:24 CET 2002 license/LICENSE.sax.txt
        2827 Sun Dec 01 11:57:24 CET 2002 license/LICENSE.txt
        1274 Sun Dec 01 11:57:24 CET 2002 license/README.dom.txt
        715 Sun Dec 01 11:57:24 CET 2002 license/README.sax.txt
        672 Sun Dec 01 11:57:24 CET 2002 license/README.txt

        Matching these entries by datestamp they find the same items in my old and my brand new .m2/ repos:

        jar -tvf
        .m2/repository/com/github/stephenc/high-scale-lib/high-scale-lib/1.1.2/high-scale-lib-1.1.2.jar

        grep LICENSE
        1596 Mon Dec 20 15:47:30 CET 2010 LICENSE

        jar -tvf
        .m2/repository/xml-apis/xml-apis/1.0.b2/xml-apis-1.0.b2.jar | grep -i
        license
        0 Sun Dec 01 11:57:24 CET 2002 license/

        mvn --version
        Apache Maven 3.0.3 (r1075438; 2011-02-28 18:31:09+0100)

        Nearby:

        Show
        Dan Brickley added a comment - Confirmed: I wiped out .m2/ and rebuilt a totally fresh checkout, ... have same issue as above still. jar -xvf [...] /trunk/examples/target/mahout-examples-0.6-SNAPSHOT-job.jar created: javax/xml/ created: javax/xml/parsers/ created: javax/xml/transform/ created: javax/xml/transform/dom/ created: javax/xml/transform/sax/ created: javax/xml/transform/stream/ java.io.IOException: license : could not create directory at sun.tools.jar.Main.extractFile(Main.java:909) at sun.tools.jar.Main.extract(Main.java:852) at sun.tools.jar.Main.run(Main.java:242) at sun.tools.jar.Main.main(Main.java:1149) jar -tvf [...] /trunk/examples/target/mahout-examples-0.6-SNAPSHOT-job.jar grep -i license 19355 Sat Feb 26 19:16:30 CET 2011 META-INF/LICENSE.txt 11358 Sun Apr 11 21:45:12 CEST 2010 META-INF/LICENSE 1596 Mon Dec 20 15:47:30 CET 2010 LICENSE 0 Sun Dec 01 11:57:24 CET 2002 license/ 4083 Sun Dec 01 11:57:24 CET 2002 license/LICENSE.dom-documentation.txt 3595 Sun Dec 01 11:57:24 CET 2002 license/LICENSE.dom-software.txt 804 Sun Dec 01 11:57:24 CET 2002 license/LICENSE.sax.txt 2827 Sun Dec 01 11:57:24 CET 2002 license/LICENSE.txt 1274 Sun Dec 01 11:57:24 CET 2002 license/README.dom.txt 715 Sun Dec 01 11:57:24 CET 2002 license/README.sax.txt 672 Sun Dec 01 11:57:24 CET 2002 license/README.txt Matching these entries by datestamp they find the same items in my old and my brand new .m2/ repos: jar -tvf .m2/repository/com/github/stephenc/high-scale-lib/high-scale-lib/1.1.2/high-scale-lib-1.1.2.jar grep LICENSE 1596 Mon Dec 20 15:47:30 CET 2010 LICENSE jar -tvf .m2/repository/xml-apis/xml-apis/1.0.b2/xml-apis-1.0.b2.jar | grep -i license 0 Sun Dec 01 11:57:24 CET 2002 license/ mvn --version Apache Maven 3.0.3 (r1075438; 2011-02-28 18:31:09+0100) Nearby: Similar woes elsewhere: https://groups.google.com/a/cloudera.org/group/cdh-user/browse_thread/thread/a201218000e956d3/cc6eca3ef9f80ff8?show_docid=cc6eca3ef9f80ff8 Hadoop diagnostics issue: https://issues.apache.org/jira/browse/HADOOP-6614
        Hide
        Dan Brickley added a comment -

        I tried adding META_INF/LICENSE to the examples/src/main/assembly/job.xml excludes (it already had <exclude>META-INF</exclude> and <exclude>META-INF/**</exclude>), and noticed this in the build log:

        [INFO] — maven-assembly-plugin:2.2:single (job) @ mahout-examples —
        [INFO] Reading assembly descriptor: src/main/assembly/job.xml
        [WARNING] Missing POM for jfree:jcommon:jar:1.0.16
        [WARNING] The following patterns were never triggered in this artifact exclusion filter:
        o 'META-INF'
        o 'META-INF/LICENSE'
        o 'META-INF/**'

        http://stackoverflow.com/questions/1034347/maven-assembly-ignores-parent-dependencies and
        http://jira.codehaus.org/browse/MASSEMBLY-194
        http://jira.codehaus.org/browse/MASSEMBLY-223 seem potentially relevant.

        From those, I tried adding <useTransitiveDependencies>true</useTransitiveDependencies> inside examples/src/main/assembly/job.xml:<dependencySet>; and I tried making sure <exclude>META-INF/LICENSE</exclude> was the first exclude inside <excludes>; bad guesses both, it seems.

        Show
        Dan Brickley added a comment - I tried adding META_INF/LICENSE to the examples/src/main/assembly/job.xml excludes (it already had <exclude>META-INF</exclude> and <exclude>META-INF/**</exclude>), and noticed this in the build log: [INFO] — maven-assembly-plugin:2.2:single (job) @ mahout-examples — [INFO] Reading assembly descriptor: src/main/assembly/job.xml [WARNING] Missing POM for jfree:jcommon:jar:1.0.16 [WARNING] The following patterns were never triggered in this artifact exclusion filter: o 'META-INF' o 'META-INF/LICENSE' o 'META-INF/**' http://stackoverflow.com/questions/1034347/maven-assembly-ignores-parent-dependencies and http://jira.codehaus.org/browse/MASSEMBLY-194 http://jira.codehaus.org/browse/MASSEMBLY-223 seem potentially relevant. From those, I tried adding <useTransitiveDependencies>true</useTransitiveDependencies> inside examples/src/main/assembly/job.xml:<dependencySet>; and I tried making sure <exclude>META-INF/LICENSE</exclude> was the first exclude inside <excludes>; bad guesses both, it seems.
        Hide
        Dan Brickley added a comment -

        Verbose build log here: http://danbri.org/tmp/mahout/mahout-780-buildlog.txt (example bad jar in same dir, but you get the idea).

        maybe related: http://jamesbetteley.wordpress.com/2011/05/06/maven-assembly-plugin-inheritance-headache/

        Show
        Dan Brickley added a comment - Verbose build log here: http://danbri.org/tmp/mahout/mahout-780-buildlog.txt (example bad jar in same dir, but you get the idea). maybe related: http://jamesbetteley.wordpress.com/2011/05/06/maven-assembly-plugin-inheritance-headache/
        Hide
        Sean Owen added a comment -

        Scratch last comment. I have the same result as Dan.

        I think the answer is that those things are supposed to be there since we do repackage all the dependent jars as-is, and that's a good thing. We want to preserve licenses and data files.

        So the real question is, how do you re-map a file during packaging? I think we want to redirect "LICENSE" to "license/LICENSE".
        Benson, do you have pointers before I dig into Encyclopedia Mavenica?

        Show
        Sean Owen added a comment - Scratch last comment. I have the same result as Dan. I think the answer is that those things are supposed to be there since we do repackage all the dependent jars as-is, and that's a good thing. We want to preserve licenses and data files. So the real question is, how do you re-map a file during packaging? I think we want to redirect "LICENSE" to "license/LICENSE". Benson, do you have pointers before I dig into Encyclopedia Mavenica?
        Hide
        Benson Margulies added a comment -

        Ignore my previous comment. I'll have time this evening to look at this.

        Show
        Benson Margulies added a comment - Ignore my previous comment. I'll have time this evening to look at this.
        Hide
        Dan Brickley added a comment -

        Glad(-ish) that it's not just me.

        Can you confirm you see the same warnings when running mvn with -e -X ?

        If the exclude patterns are not firing, i guess any remapping would fail too. From reading around, seems to be related to chaining together a hierarchy of poms...

        Show
        Dan Brickley added a comment - Glad(-ish) that it's not just me. Can you confirm you see the same warnings when running mvn with -e -X ? If the exclude patterns are not firing, i guess any remapping would fail too. From reading around, seems to be related to chaining together a hierarchy of poms...
        Hide
        Sean Owen added a comment -

        OK, some answers.

        First, you're right that this conflict is due to the XML-RPC jar and the high-scale-lib jar. (Neither are to blame in any sense.) The latter one comes in via the Cassandra dependency. That's one I just added to mahout-integration last week. So, that's why this has cropped up now. It doesn't affect core or math, of course. It should affect integration, but doesn't, since there is no job jar for integration. It does affect examples since there is a job jar, and it depends on integration.

        The good news is there should be no need to package this high-scale-lib dependency anyway. It's not needed by anything in examples. So, just exclude it with a note. See my patch. It does get the conflict out of the examples job jar.

        Show
        Sean Owen added a comment - OK, some answers. First, you're right that this conflict is due to the XML-RPC jar and the high-scale-lib jar. (Neither are to blame in any sense.) The latter one comes in via the Cassandra dependency. That's one I just added to mahout-integration last week. So, that's why this has cropped up now. It doesn't affect core or math, of course. It should affect integration, but doesn't, since there is no job jar for integration. It does affect examples since there is a job jar, and it depends on integration. The good news is there should be no need to package this high-scale-lib dependency anyway. It's not needed by anything in examples. So, just exclude it with a note. See my patch. It does get the conflict out of the examples job jar.
        Hide
        Benson Margulies added a comment -

        Sean, is this completely under control?

        Show
        Benson Margulies added a comment - Sean, is this completely under control?
        Hide
        Dan Brickley added a comment -

        Sean's fix works for me. But it seems to leave things open for the same problem to re-occur in future if another package has a clashing file/dir name. Maybe that's fine; googling on

        mahout "Exception in thread "main" java.io.IOException: Mkdirs failed to create"

        ...already finds this JIRA entry, which at least helps with diagnosis if it happens again. First impressions with this one are misleading; I initially assumed Hadoop was having problems writing to local or hdfs /tmp/, before realising that the .jar was the problem.

        Show
        Dan Brickley added a comment - Sean's fix works for me. But it seems to leave things open for the same problem to re-occur in future if another package has a clashing file/dir name. Maybe that's fine; googling on mahout "Exception in thread "main" java.io.IOException: Mkdirs failed to create" ...already finds this JIRA entry, which at least helps with diagnosis if it happens again. First impressions with this one are misleading; I initially assumed Hadoop was having problems writing to local or hdfs /tmp/, before realising that the .jar was the problem.
        Hide
        Sean Owen added a comment -

        I think this is a completely legit workaround. It's not a "real" fix in that there's nothing preventing it from happening again, and in a way that we can't just nix the dependency. Perhaps, cross that bridge when we come to it. It is probably quite rare: almost everything in a jar is in its own "namespace" of directories. This probably only happens at the root or in META-INF, and only happens if you have files that differ in case alone, and only happens on OS X with case-insensitive HFS+, etc...

        Show
        Sean Owen added a comment - I think this is a completely legit workaround. It's not a "real" fix in that there's nothing preventing it from happening again, and in a way that we can't just nix the dependency. Perhaps, cross that bridge when we come to it. It is probably quite rare: almost everything in a jar is in its own "namespace" of directories. This probably only happens at the root or in META-INF, and only happens if you have files that differ in case alone, and only happens on OS X with case-insensitive HFS+, etc...
        Hide
        Dan Brickley added a comment -

        Yup, not worth trying to bullet-proof against such unlikely scenarios. Seems fine to close with your patch (& thanks!).

        Show
        Dan Brickley added a comment - Yup, not worth trying to bullet-proof against such unlikely scenarios. Seems fine to close with your patch (& thanks!).
        Hide
        Hudson added a comment -

        Integrated in Mahout-Quality #972 (See https://builds.apache.org/job/Mahout-Quality/972/)
        MAHOUT-780 exclude unnecessary transitive dependency, creating a funny case-insensitive-file-system problem in unpacking the combined job jar

        srowen : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1153473
        Files :

        • /mahout/trunk/examples/src/main/assembly/job.xml
        Show
        Hudson added a comment - Integrated in Mahout-Quality #972 (See https://builds.apache.org/job/Mahout-Quality/972/ ) MAHOUT-780 exclude unnecessary transitive dependency, creating a funny case-insensitive-file-system problem in unpacking the combined job jar srowen : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1153473 Files : /mahout/trunk/examples/src/main/assembly/job.xml
        Hide
        William McNeill added a comment -

        Take a look at the Maven shade plugin.

        I had the same problem when trying to build an uber-jar for a different program. A "LICENSE" file conflicted with a "license" directory, which caused a bug on my Mac. I fixed it by building the uber-jar with the Maven shade plugin, which allows you to customize the construction of uber-jars.

        An example of how I fixed my problem is in https://github.com/wpm/Hadoop-GATE. The following part of the pom.xml in that project strips the offending LICENSE file from one of the dependency jars.

        <filter>
        <artifact>org.codehaus.woodstox:wstx-lgpl</artifact>
        <excludes>
        <exclude>LICENSE</exclude>
        </excludes>
        </filter>

        Show
        William McNeill added a comment - Take a look at the Maven shade plugin. I had the same problem when trying to build an uber-jar for a different program. A "LICENSE" file conflicted with a "license" directory, which caused a bug on my Mac. I fixed it by building the uber-jar with the Maven shade plugin, which allows you to customize the construction of uber-jars. An example of how I fixed my problem is in https://github.com/wpm/Hadoop-GATE . The following part of the pom.xml in that project strips the offending LICENSE file from one of the dependency jars. <filter> <artifact>org.codehaus.woodstox:wstx-lgpl</artifact> <excludes> <exclude>LICENSE</exclude> </excludes> </filter>

          People

          • Assignee:
            Sean Owen
            Reporter:
            Sean Owen
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development