Solr
  1. Solr
  2. SOLR-3204

solr-commons-csv must not use the org.apache.commons.csv package

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 3.5
    • Fix Version/s: 3.6, 4.0-ALPHA
    • Component/s: Build
    • Labels:
      None

      Description

      The solr-commons-csv artifact reused the code from the Apache Commons CSV project but the package wasn't changed to something else than org.apache.commons.csv in the process. This creates a compatibility issue as the Apache Commons team works toward an official release of Commons CSV. It prevents Commons CSV from using its own org.apache.commons.csv package, or forces the renaming of all the classes to avoid a classpath conflict.

      1. solr-csv.patch
        72 kB
        Emmanuel Bourg
      2. SOLR-3204.patch
        2 kB
        Steve Rowe
      3. SOLR-3204.patch
        28 kB
        Uwe Schindler
      4. SOLR-3204.patch
        32 kB
        Uwe Schindler
      5. SOLR-3204.patch
        33 kB
        Uwe Schindler
      6. rule.txt
        0.1 kB
        Uwe Schindler
      7. rule.txt
        0.1 kB
        Uwe Schindler
      8. apache-solr-commons-csv-1.0-SNAPSHOT-r966014.jar
        27 kB
        Uwe Schindler
      9. apache-solr-commons-csv-1.0-SNAPSHOT-r966014.jar
        25 kB
        Uwe Schindler

        Issue Links

          Activity

          Hide
          Ryan McKinley added a comment -

          Solr is using a snapshot commons CSV build (commons-csv-1.0-SNAPSHOT-r966014) – solr does not change anything and would prefer to use a release build.

          Has there been class refactorings that make replacing the .jar with the dev build impossible? If so, perhaps we should just update the .jar file we are including.

          Show
          Ryan McKinley added a comment - Solr is using a snapshot commons CSV build (commons-csv-1.0-SNAPSHOT-r966014) – solr does not change anything and would prefer to use a release build. Has there been class refactorings that make replacing the .jar with the dev build impossible? If so, perhaps we should just update the .jar file we are including.
          Hide
          Yonik Seeley added a comment -

          Background here: http://markmail.org/message/lsxtxoetpw7a47uf

          As far as I can determine, this is just a maven meta-data issue?
          i.e. it doesn't seem like we should change java package names, but just make sure that the maven meta-data is completely distinct from any official commons-csv releases? (disclaimer: I still really know nothing of maven, other than it's becoming more a pain all the time

          Show
          Yonik Seeley added a comment - Background here: http://markmail.org/message/lsxtxoetpw7a47uf As far as I can determine, this is just a maven meta-data issue? i.e. it doesn't seem like we should change java package names, but just make sure that the maven meta-data is completely distinct from any official commons-csv releases? (disclaimer: I still really know nothing of maven, other than it's becoming more a pain all the time
          Hide
          Ryan McKinley added a comment -

          Emmanuel, can you confirm this is just a maven issue?

          Can it be solved with adding exclusions?

              <dependency>
                <groupId>org.apache.solr</groupId>
                <artifactId>solr-core</artifactId>
                <version>${solr.version}</version>
                <exclusions>
                  <exclusion>
                    <groupId>org.apache.solr</groupId>
                    <artifactId>solr-commons-csv</artifactId>
                  </exclusion>
                  ...
          

          Obviously the best approach is to work with an official release... but until that exists, what do you suggest? (Actually forking the project and changing the package names seems pretty bad)

          Show
          Ryan McKinley added a comment - Emmanuel, can you confirm this is just a maven issue? Can it be solved with adding exclusions? <dependency> <groupId> org.apache.solr </groupId> <artifactId> solr-core </artifactId> <version> ${solr.version} </version> <exclusions> <exclusion> <groupId> org.apache.solr </groupId> <artifactId> solr-commons-csv </artifactId> </exclusion> ... Obviously the best approach is to work with an official release... but until that exists, what do you suggest? (Actually forking the project and changing the package names seems pretty bad)
          Hide
          Emmanuel Bourg added a comment -

          Hi all and thank you for your quick reply. This is not a "simple" Maven metadata issue. Even if you consider solr-commons-csv to be an internal part of Solr it's now a public Maven artifact that people are starting to use in their projects (The Apache Commons teams even starts receiving bug reports for solr-commons-csv). When Commons CSV is officially released there will be a conflict if solr-commons-csv and commons-csv are both brought to the classpath through the transitive dependencies of a project, because they have the same classes in the same package.

          Imagine a project using Solr and doing CSV stuff with Commons CSV, when the CSVParser classes is instantiated it can be either the one from solr-commons-csv or the one from Commons CSV, this is guaranteed to break badly.

          The right approach is to change the package of solr-commons-csv, for example to org.apache.solr.commons.csv, so both solr-commons-csv and Commons CSV can coexist on the classpath without conflict. This can be automated with the Maven Shade plugin.

          Show
          Emmanuel Bourg added a comment - Hi all and thank you for your quick reply. This is not a "simple" Maven metadata issue. Even if you consider solr-commons-csv to be an internal part of Solr it's now a public Maven artifact that people are starting to use in their projects (The Apache Commons teams even starts receiving bug reports for solr-commons-csv). When Commons CSV is officially released there will be a conflict if solr-commons-csv and commons-csv are both brought to the classpath through the transitive dependencies of a project, because they have the same classes in the same package. Imagine a project using Solr and doing CSV stuff with Commons CSV, when the CSVParser classes is instantiated it can be either the one from solr-commons-csv or the one from Commons CSV, this is guaranteed to break badly. The right approach is to change the package of solr-commons-csv, for example to org.apache.solr.commons.csv, so both solr-commons-csv and Commons CSV can coexist on the classpath without conflict. This can be automated with the Maven Shade plugin.
          Hide
          Robert Muir added a comment -

          This is a maven issue. solr-commons-csv only exists to maven.

          Otherwise these are just classes inside the war file (essentially an implementation detail of solr).
          It seems to me, no offense, that the solution is to make an official release of commons-csv!

          Then nothing needs to coexist, as we would rather use a release instead anywaay, and all
          grandfather poms,wagons, nexuses,surefires,whatever in maven will be adjusted and everyone is happy.

          Show
          Robert Muir added a comment - This is a maven issue. solr-commons-csv only exists to maven. Otherwise these are just classes inside the war file (essentially an implementation detail of solr). It seems to me, no offense, that the solution is to make an official release of commons-csv! Then nothing needs to coexist, as we would rather use a release instead anywaay, and all grandfather poms,wagons, nexuses,surefires,whatever in maven will be adjusted and everyone is happy.
          Hide
          Emmanuel Bourg added a comment -

          I agree a release of Commons CSV is desirable, but that's not a reason for using the namespace of other projects. The artifact is free to use by anyone from the Maven repository, it's no longer an implementation detail of Solr, people are starting to use it in the wild, and this will lead to classpath conflicts.

          Repackaging a Commons component and renaming its package isn't uncommon, Tomcat does this for DBCP:

          http://www.jarvana.com/jarvana/inspect/org/apache/tomcat/tomcat-dbcp/7.0.23/tomcat-dbcp-7.0.23.jar

          Tomcat is released with a tomcat-dbcp artifact containing the DBCP classes but under the org.apache.tomcat.dbcp package. Solr should adopt the same approach until Commons CSV is officially released.

          Show
          Emmanuel Bourg added a comment - I agree a release of Commons CSV is desirable, but that's not a reason for using the namespace of other projects. The artifact is free to use by anyone from the Maven repository, it's no longer an implementation detail of Solr, people are starting to use it in the wild, and this will lead to classpath conflicts. Repackaging a Commons component and renaming its package isn't uncommon, Tomcat does this for DBCP: http://www.jarvana.com/jarvana/inspect/org/apache/tomcat/tomcat-dbcp/7.0.23/tomcat-dbcp-7.0.23.jar Tomcat is released with a tomcat-dbcp artifact containing the DBCP classes but under the org.apache.tomcat.dbcp package. Solr should adopt the same approach until Commons CSV is officially released.
          Hide
          Robert Muir added a comment -

          seriously, this is no blocker.

          I don't appreciate you reverting my change there.

          now this becomes resolved as won't fix.

          Show
          Robert Muir added a comment - seriously, this is no blocker. I don't appreciate you reverting my change there. now this becomes resolved as won't fix.
          Hide
          Emmanuel Bourg added a comment -

          Robert, should I appreciate that you are negating the issue? This is considered a blocker because Solr must not make new releases until this is resolved. This is a serious issue, don't take it lightly please.

          Show
          Emmanuel Bourg added a comment - Robert, should I appreciate that you are negating the issue? This is considered a blocker because Solr must not make new releases until this is resolved. This is a serious issue, don't take it lightly please.
          Hide
          Uwe Schindler added a comment -

          It is not an issue, as Solr is built with Apache Ant, Apache Maven is not fully supported and only "for convenience". If it works, it works, if not its of minor priority. It definitely does not block a release.

          Show
          Uwe Schindler added a comment - It is not an issue, as Solr is built with Apache Ant, Apache Maven is not fully supported and only "for convenience". If it works, it works, if not its of minor priority. It definitely does not block a release.
          Hide
          Robert Muir added a comment -

          This is considered a blocker because Solr must not make new releases until this is resolved. This is a serious issue, don't take it lightly please.

          Really? who will stop us?

          Seriously, all solr does is put commons-csv-1.0-SNAPSHOT-r966014.jar in solr/lib (along with LICENSE/NOTICE information).

          We don't use any commons-csv-1.0-RELEASE.jar because one doesnt exist.

          Show
          Robert Muir added a comment - This is considered a blocker because Solr must not make new releases until this is resolved. This is a serious issue, don't take it lightly please. Really? who will stop us? Seriously, all solr does is put commons-csv-1.0-SNAPSHOT-r966014.jar in solr/lib (along with LICENSE/NOTICE information). We don't use any commons-csv-1.0-RELEASE.jar because one doesnt exist.
          Hide
          Torsten Curdt added a comment -

          Folks, how can you not see that this is an issue?

          This is not a maven problem. You are releasing code in the Commons namespace that eventually will clash. I am sure everyone is on board to get this resolved by releasing commons-csv ...but if you do a release beforehand - please be a good ASF citizen and use some kind of shading technique (e.g. using jarjar).

          Just closing this issue as "Won't Fix" is not helping.

          Show
          Torsten Curdt added a comment - Folks, how can you not see that this is an issue? This is not a maven problem. You are releasing code in the Commons namespace that eventually will clash. I am sure everyone is on board to get this resolved by releasing commons-csv ...but if you do a release beforehand - please be a good ASF citizen and use some kind of shading technique (e.g. using jarjar). Just closing this issue as "Won't Fix" is not helping.
          Hide
          Emmanuel Bourg added a comment -

          Really? who will stop us?

          Your morality, kindness and innate sense of teamwork among Apache developers ?

          You may have not noticed that solr-commons-csv is in the central Maven repository, it's not just secretly included in the solr/lib directory:

          http://search.maven.org/#artifactdetails|org.apache.solr|solr-commons-csv|3.5.0|jar

          That means two things:

          • People are using this artifact in their Maven projects, independently of Solr
          • Anyone importing the solr-core artifact in its project will be unable to use Commons CSV when it's released
          Show
          Emmanuel Bourg added a comment - Really? who will stop us? Your morality, kindness and innate sense of teamwork among Apache developers ? You may have not noticed that solr-commons-csv is in the central Maven repository, it's not just secretly included in the solr/lib directory: http://search.maven.org/#artifactdetails|org.apache.solr|solr-commons-csv|3.5.0|jar That means two things: People are using this artifact in their Maven projects, independently of Solr Anyone importing the solr-core artifact in its project will be unable to use Commons CSV when it's released
          Hide
          Robert Muir added a comment -

          You are releasing code in the Commons namespace that eventually will clash

          We aren't releasing any code.

          We have a library in solr/lib. we are depending upon this library. we don't release it.

          This issue is only about maven, which is just supplied as convenience, as Uwe mentions.

          Show
          Robert Muir added a comment - You are releasing code in the Commons namespace that eventually will clash We aren't releasing any code. We have a library in solr/lib. we are depending upon this library. we don't release it. This issue is only about maven, which is just supplied as convenience, as Uwe mentions.
          Hide
          Emmanuel Bourg added a comment -

          Here is a patch importing the Commons CSV revision 966014 in the org.apache.solr.csv package to avoid classpath conflicts

          Show
          Emmanuel Bourg added a comment - Here is a patch importing the Commons CSV revision 966014 in the org.apache.solr.csv package to avoid classpath conflicts
          Hide
          Robert Muir added a comment -

          -1

          I don't think we should code-dup entire packages just because of the viral nature of maven
          (to make a maven release of X, all of its dependencies must also be in maven, too).

          That seems like the root problem here. Perhaps a blocker issue should be opened in maven's JIRA instead.

          Show
          Robert Muir added a comment - -1 I don't think we should code-dup entire packages just because of the viral nature of maven (to make a maven release of X, all of its dependencies must also be in maven, too). That seems like the root problem here. Perhaps a blocker issue should be opened in maven's JIRA instead.
          Hide
          Uwe Schindler added a comment -

          Just to add: Should we also clone/fork the whole Mortbay Jetty repository or Apache Xerces into Solr's tree? For both libraries we also have bugfixes but upstream did not release an update until now?

          What will user X do, if the probject he is using depends on an older CSV version but Solr needs a new one, although in Maven Central? Yes, he has a conflict. The situation is not different here.

          Show
          Uwe Schindler added a comment - Just to add: Should we also clone/fork the whole Mortbay Jetty repository or Apache Xerces into Solr's tree? For both libraries we also have bugfixes but upstream did not release an update until now? What will user X do, if the probject he is using depends on an older CSV version but Solr needs a new one, although in Maven Central? Yes, he has a conflict. The situation is not different here.
          Hide
          Uwe Schindler added a comment -

          One idea how to solve the issue: Maybe commons-csv should publish their nightly builds through jenkins to maven? Then Solr could add the snapshot build as a dependency and we are done?

          Uwe

          Show
          Uwe Schindler added a comment - One idea how to solve the issue: Maybe commons-csv should publish their nightly builds through jenkins to maven? Then Solr could add the snapshot build as a dependency and we are done? Uwe
          Hide
          Ralph Goers added a comment -

          Robert, if you are not releasing code then how did the artifact find its way into the Maven Central repository? The maintainers of the repository will ONLY accept artifacts when they come from the appropriate place - in this case that would need to be Apache's repository. I can't imagine how anyone other than Solr's team could have caused that to happen.

          As for snapshots, that doesn't really solve the problem as you can't release with a dependency on a snapshot.

          If the intent was just to use the jar internally in a war file that your customers don't change then somewhere in your build you have a mistake as the jar is apparently being made available to the public. If, however, the dependency is required for users or Solr to do builds then you need to change the package names so that they don't conflict with those in Commons.

          Show
          Ralph Goers added a comment - Robert, if you are not releasing code then how did the artifact find its way into the Maven Central repository? The maintainers of the repository will ONLY accept artifacts when they come from the appropriate place - in this case that would need to be Apache's repository. I can't imagine how anyone other than Solr's team could have caused that to happen. As for snapshots, that doesn't really solve the problem as you can't release with a dependency on a snapshot. If the intent was just to use the jar internally in a war file that your customers don't change then somewhere in your build you have a mistake as the jar is apparently being made available to the public. If, however, the dependency is required for users or Solr to do builds then you need to change the package names so that they don't conflict with those in Commons.
          Hide
          Robert Muir added a comment -

          Ralph: see Uwe's comment.

          Its frequently the case that lucene/solr depend on things that are not in maven. When this is the case, the guys who work on maven here
          do whatever is necessary to make maven work (I really don't know the details, most of us just use ant).

          This is purely a maven issue, and as mentioned earlier, maven is provided as a convenience for build configuration, no different from
          convenience build configurations provided for Eclipse or Intellij IDEA.

          Thats why I said: we aren't releasing anything. there is just commons-csv-1.0-SNAPSHOT-r966014.jar in our solr/lib. Our ant build system
          packages this up into solr.war

          As for maven, eclipse, intellij, netbeans, whatever, it doesnt really matter to me how they work, but supporting those alternative
          build configurations != releasing code.

          Show
          Robert Muir added a comment - Ralph: see Uwe's comment. Its frequently the case that lucene/solr depend on things that are not in maven. When this is the case, the guys who work on maven here do whatever is necessary to make maven work (I really don't know the details, most of us just use ant). This is purely a maven issue, and as mentioned earlier, maven is provided as a convenience for build configuration, no different from convenience build configurations provided for Eclipse or Intellij IDEA. Thats why I said: we aren't releasing anything. there is just commons-csv-1.0-SNAPSHOT-r966014.jar in our solr/lib. Our ant build system packages this up into solr.war As for maven, eclipse, intellij, netbeans, whatever, it doesnt really matter to me how they work, but supporting those alternative build configurations != releasing code.
          Hide
          Emmanuel Bourg added a comment -

          One idea how to solve the issue: Maybe commons-csv should publish their nightly builds through jenkins to maven? Then Solr could add the snapshot build as a dependency and we are done?

          Commons CSV snapshots are available in the Apache snapshot repository, but when you release Solr in the central repository all its dependencies have to be there as well, so that won't work.

          What will user X do, if the project he is using depends on an older CSV version but Solr needs a new one, although in Maven Central? Yes, he has a conflict. The situation is not different here.

          Once Commons CSV is released we guarantee the binary compatibility of the next releases. The Apache Commons team takes compatibility issues very seriously because our components are reused in many projects. When we have to break the compatibility the code is moved to a new package to avoid conflicts. For example Commons Lang 3 broke the binary compatibility with the previous releases and used a new namespace org.apache.commons.lang3 to avoid issues.

          If Solr wants to shield its user from incompatibilities, even after Commons CSV is officially released, it's possible to automate the process of renaming the package of the dependencies. That's what Tomcat does with Commons DBCP. Tools like the Maven Shade plugin or the JarJar Ant tasks are designed for this purpose.

          Show
          Emmanuel Bourg added a comment - One idea how to solve the issue: Maybe commons-csv should publish their nightly builds through jenkins to maven? Then Solr could add the snapshot build as a dependency and we are done? Commons CSV snapshots are available in the Apache snapshot repository, but when you release Solr in the central repository all its dependencies have to be there as well, so that won't work. What will user X do, if the project he is using depends on an older CSV version but Solr needs a new one, although in Maven Central? Yes, he has a conflict. The situation is not different here. Once Commons CSV is released we guarantee the binary compatibility of the next releases. The Apache Commons team takes compatibility issues very seriously because our components are reused in many projects. When we have to break the compatibility the code is moved to a new package to avoid conflicts. For example Commons Lang 3 broke the binary compatibility with the previous releases and used a new namespace org.apache.commons.lang3 to avoid issues. If Solr wants to shield its user from incompatibilities, even after Commons CSV is officially released, it's possible to automate the process of renaming the package of the dependencies. That's what Tomcat does with Commons DBCP. Tools like the Maven Shade plugin or the JarJar Ant tasks are designed for this purpose.
          Hide
          Uwe Schindler added a comment -

          Robert, if you are not releasing code then how did the artifact find its way into the Maven Central repository? The maintainers of the repository will ONLY accept artifacts when they come from the appropriate place - in this case that would need to be Apache's repository. I can't imagine how anyone other than Solr's team could have caused that to happen.

          We are releasing Maven Artifacts of Solr just for convenience using a special ANT script (which is unsupported by our committers). The community wants it, so we do it. Maybe that script also published one of the snapshot artifacts (is there also a Jetty prerelease artifact?); but that was not our intention. We should simply remove it.

          Show
          Uwe Schindler added a comment - Robert, if you are not releasing code then how did the artifact find its way into the Maven Central repository? The maintainers of the repository will ONLY accept artifacts when they come from the appropriate place - in this case that would need to be Apache's repository. I can't imagine how anyone other than Solr's team could have caused that to happen. We are releasing Maven Artifacts of Solr just for convenience using a special ANT script (which is unsupported by our committers). The community wants it, so we do it. Maybe that script also published one of the snapshot artifacts (is there also a Jetty prerelease artifact?); but that was not our intention. We should simply remove it.
          Hide
          Torsten Curdt added a comment -

          You guys don't like maven - we get that (who really does?) ...but that's not the point. For now let's just assume your bugfix jars did not get released to maven central. (That's another issue to deal with)

          What about Solr users that now have some unreleases Xerces, Jetty und Commons classes in their classpath? Maybe even without knowing? If you don't hate your users - don't do it.

          Maintaining upstreaming patches is painful - but that's how it is. Push for a release, maintain a patch or (only worst case) fork.

          But there is an option somewhere in the middle is using using tools like minijar/the shade plugin/jarjar. They let you do the rewriting of the package transparently on the byte code level. That makes getting back to the releases version easier. Or you do it like Tomcat - they so some search and replace in their build.

          I your specific case I would keep the bugfix jars in your lib dir (just like you have now) but modify your build to include a relocated version of your bugfix jars into the Solr jar (by using jarjar). Problem solved!

          No one needs those bugfix version in a repo and it's super easy to fall back to the official release once out.

          There are ways around big code duplications and still avoid the namespace problem.
          Come on guys - be good to the world

          Show
          Torsten Curdt added a comment - You guys don't like maven - we get that (who really does?) ...but that's not the point. For now let's just assume your bugfix jars did not get released to maven central. (That's another issue to deal with) What about Solr users that now have some unreleases Xerces, Jetty und Commons classes in their classpath? Maybe even without knowing? If you don't hate your users - don't do it. Maintaining upstreaming patches is painful - but that's how it is. Push for a release, maintain a patch or (only worst case) fork. But there is an option somewhere in the middle is using using tools like minijar/the shade plugin/jarjar. They let you do the rewriting of the package transparently on the byte code level. That makes getting back to the releases version easier. Or you do it like Tomcat - they so some search and replace in their build. I your specific case I would keep the bugfix jars in your lib dir (just like you have now) but modify your build to include a relocated version of your bugfix jars into the Solr jar (by using jarjar). Problem solved! No one needs those bugfix version in a repo and it's super easy to fall back to the official release once out. There are ways around big code duplications and still avoid the namespace problem. Come on guys - be good to the world
          Hide
          Sebb added a comment -

          This is a classpath issue - a class can be loaded only once in a given classpath.
          It's vitally important to ensure that each class has only a single definition in a classpath, otherwise all sorts of failures can occur. These failures can be extremely difficult to debug.

          Maven uses the groupId/artifactId to resolve multiple references to the same jar.
          It picks one copy of the most recent version.

          This only works properly if the Maven ids are unique for each package.
          It does not read the package names within the jar; it assumes that the POM is correct for the associated jar.

          In this case, the Solr pom uses its own unique groupId/artifactId, but the package names used within the jar (and indeed the pom inside the jar) are for an entirely different package. This is totally contrary to the way Maven is supposed to be used.

          The consequence is that Maven does not know that the renamed "solr" commons-csv jar uses the same package as the "real" commons-csv, so it cannot prevent duplicate classes on the classpath.

          Show
          Sebb added a comment - This is a classpath issue - a class can be loaded only once in a given classpath. It's vitally important to ensure that each class has only a single definition in a classpath, otherwise all sorts of failures can occur. These failures can be extremely difficult to debug. Maven uses the groupId/artifactId to resolve multiple references to the same jar. It picks one copy of the most recent version. This only works properly if the Maven ids are unique for each package. It does not read the package names within the jar; it assumes that the POM is correct for the associated jar. In this case, the Solr pom uses its own unique groupId/artifactId, but the package names used within the jar (and indeed the pom inside the jar) are for an entirely different package. This is totally contrary to the way Maven is supposed to be used. The consequence is that Maven does not know that the renamed "solr" commons-csv jar uses the same package as the "real" commons-csv, so it cannot prevent duplicate classes on the classpath.
          Hide
          Steve Rowe added a comment -

          I think we should fix this.

          We are releasing Maven Artifacts of Solr just for convenience using a special ANT script (which is unsupported by our committers). The community wants it, so we do it. Maybe that script also published one of the snapshot artifacts (is there also a Jetty prerelease artifact?); but that was not our intention. We should simply remove it.

          Because of Solr's dependency on a snapshot of commons-csv, and because Maven releases are not allowed to depend on snapshots, we have to publish the snapshot, renamed not to appear to be a snapshot. Simply removing it is not an option, assuming we will continue to provide Maven artifacts as part of the Lucene/Solr release process.

          Show
          Steve Rowe added a comment - I think we should fix this. We are releasing Maven Artifacts of Solr just for convenience using a special ANT script (which is unsupported by our committers). The community wants it, so we do it. Maybe that script also published one of the snapshot artifacts (is there also a Jetty prerelease artifact?); but that was not our intention. We should simply remove it. Because of Solr's dependency on a snapshot of commons-csv, and because Maven releases are not allowed to depend on snapshots, we have to publish the snapshot, renamed not to appear to be a snapshot. Simply removing it is not an option, assuming we will continue to provide Maven artifacts as part of the Lucene/Solr release process.
          Hide
          Robert Muir added a comment -

          Maintaining upstreaming patches is painful - but that's how it is.

          If there is a bug in some 3rd party component: i care about fixing the bug.

          I don't care about maven, I'm not going to do the extra work to deal with these viral issues and its not a blocker for releasing 3.6

          Show
          Robert Muir added a comment - Maintaining upstreaming patches is painful - but that's how it is. If there is a bug in some 3rd party component: i care about fixing the bug. I don't care about maven, I'm not going to do the extra work to deal with these viral issues and its not a blocker for releasing 3.6
          Hide
          Emmanuel Bourg added a comment -

          There is a similar issue with the solr-carrot2-core, solr-noggit, solr-uima and solr-jsonic artifacts, and probably more.

          You were right to -1 my patch Robert, the issue is much deeper.

          Show
          Emmanuel Bourg added a comment - There is a similar issue with the solr-carrot2-core, solr-noggit, solr-uima and solr-jsonic artifacts, and probably more. You were right to -1 my patch Robert, the issue is much deeper.
          Hide
          Uwe Schindler added a comment -

          I think the jarjar idea was raised quite often, this seems to be a good solution, so we republish all "changed" dependencies using a different class/pkg name. We should create jarjared versions of all those changed snapshot dpenendecies, so the use o.a.solr.internal./o.a.lucene.internal. as package name (like Java does for Xerces/Xalan/Rhino).

          Those jars would be published by maven as it is now, but would never conflict.

          Show
          Uwe Schindler added a comment - I think the jarjar idea was raised quite often, this seems to be a good solution, so we republish all "changed" dependencies using a different class/pkg name. We should create jarjared versions of all those changed snapshot dpenendecies, so the use o.a.solr.internal. /o.a.lucene.internal. as package name (like Java does for Xerces/Xalan/Rhino). Those jars would be published by maven as it is now, but would never conflict.
          Hide
          Torsten Curdt added a comment -

          Steve, no you don't have to publish those snapshots. Please read my previous comment about inlining the bugfix jars.

          Please folks, look at http://code.google.com/p/jarjar/
          It's an easy fix and you would make many people happy.

          Show
          Torsten Curdt added a comment - Steve, no you don't have to publish those snapshots. Please read my previous comment about inlining the bugfix jars. Please folks, look at http://code.google.com/p/jarjar/ It's an easy fix and you would make many people happy.
          Hide
          Robert Muir added a comment -

          I think the jarjar idea was raised quite often, this seems to be a good solution, so we republish all "changed" dependencies using a different class/pkg name.

          This sets an unreasonable burden to fix a simple bug in a third-party component (think, the jetty patch).

          Its already hard enough to dive in as "stranger in a strange land" to some other codebase to fix a bug that affects our stuff,
          and then to rename anything that uses it too just because of maven?

          Lets take the jetty case... should we seriously rename all the jetty stuff because it has a unicode bug? what about configuration
          and reflection and stuff for this servlet container? E.g. users will no longer be able to use any normal jetty configuration or
          jetty documentation because we re-jarjared all the package names?

          <Configure id="Server" class="org.eclipse.jetty.server.Server">
            <!-- basic configuration here -->
          </Configure>
          

          etc

          this is insane.

          Show
          Robert Muir added a comment - I think the jarjar idea was raised quite often, this seems to be a good solution, so we republish all "changed" dependencies using a different class/pkg name. This sets an unreasonable burden to fix a simple bug in a third-party component (think, the jetty patch). Its already hard enough to dive in as "stranger in a strange land" to some other codebase to fix a bug that affects our stuff, and then to rename anything that uses it too just because of maven? Lets take the jetty case... should we seriously rename all the jetty stuff because it has a unicode bug? what about configuration and reflection and stuff for this servlet container? E.g. users will no longer be able to use any normal jetty configuration or jetty documentation because we re-jarjared all the package names? <Configure id="Server" class="org.eclipse.jetty.server.Server"> <!-- basic configuration here --> </Configure> etc this is insane.
          Hide
          Torsten Curdt added a comment -

          Uwe, no need to even publish those bugfix jars. I would just inline them into the Solr artifact. (Think ueberjar) Then you don't even have to change a line of java source code nor the bugfix jars. Just the build.

          Show
          Torsten Curdt added a comment - Uwe, no need to even publish those bugfix jars. I would just inline them into the Solr artifact. (Think ueberjar) Then you don't even have to change a line of java source code nor the bugfix jars. Just the build.
          Hide
          Torsten Curdt added a comment -

          Well, I do see the problem with jetty but doubt the other dependencies are that problematic.

          Show
          Torsten Curdt added a comment - Well, I do see the problem with jetty but doubt the other dependencies are that problematic.
          Hide
          Robert Muir added a comment -

          Well, I do see the problem with jetty but doubt the other dependencies are that problematic.
          {quote

          Jarjaring is dangerous also because of reflection in from other libraries.

          For example, if we find a bug in ICU, submit it, and they commit it, we should be able to
          depend upon a snapshot unreleased build of ICU, yet, at the same-time, arabic PDF extraction
          should still work (apache PDFBox reflects the icu package names to determine if this works).

          So renaming packages with jarjar is not safe for a number of reasons, I really don't think
          we should do this just to satisfy maven.

          Show
          Robert Muir added a comment - Well, I do see the problem with jetty but doubt the other dependencies are that problematic. {quote Jarjaring is dangerous also because of reflection in from other libraries. For example, if we find a bug in ICU, submit it, and they commit it, we should be able to depend upon a snapshot unreleased build of ICU, yet, at the same-time, arabic PDF extraction should still work (apache PDFBox reflects the icu package names to determine if this works). So renaming packages with jarjar is not safe for a number of reasons, I really don't think we should do this just to satisfy maven.
          Hide
          Uwe Schindler added a comment -

          Robert: Its not only a maven problem, its the DLL ähm JAR-hell So I agree with all others that having patched JARs is a no-go for production, because nobody is able to then satisfy all dependencies.

          On the other hand, Solr is a end-user product, so I have no idea why it is in maven at all. You download the distribution and install it. If you need to compile plugins against it, setup your eclipse classpath. Lucene is different, but it has no patched JARs (it does not depend on any external JAR at all). The Xerces JAR in benchmark is also only a "utility" package, that should not be in maven.

          To come back to the example:

          • Jetty should not be a problem, because Solr will upgrade to Jetty 8 (breaking lots of backwards). For the Jetty 6 series there is no support at all, so there will not be a new release. Our fixed Jetty 6 is just a plain jetty with one broken unicode method patched. If somebody accidently have it in his classpath, he will be happy that Chinese, Japanese, and Mormon UTF8 works correct...
          • For the other JARs that have no other relations except to Solr, JARJARing them is no issue at all. It would be what I would recommend. This affects commons JARs like commons-csv, also noggit,
          • The ICU example is just a minor "example", we will not need to patch ICU, because they are releasing quite often.
          • If you have reflection dependecies or dependencies from other JARs in you classpath, you can still install the original unpatched version and you are happy. They will not conflict.
          Show
          Uwe Schindler added a comment - Robert: Its not only a maven problem, its the DLL ähm JAR-hell So I agree with all others that having patched JARs is a no-go for production, because nobody is able to then satisfy all dependencies. On the other hand, Solr is a end-user product, so I have no idea why it is in maven at all. You download the distribution and install it. If you need to compile plugins against it, setup your eclipse classpath. Lucene is different, but it has no patched JARs (it does not depend on any external JAR at all). The Xerces JAR in benchmark is also only a "utility" package, that should not be in maven. To come back to the example: Jetty should not be a problem, because Solr will upgrade to Jetty 8 (breaking lots of backwards). For the Jetty 6 series there is no support at all, so there will not be a new release. Our fixed Jetty 6 is just a plain jetty with one broken unicode method patched. If somebody accidently have it in his classpath, he will be happy that Chinese, Japanese, and Mormon UTF8 works correct... For the other JARs that have no other relations except to Solr, JARJARing them is no issue at all. It would be what I would recommend. This affects commons JARs like commons-csv, also noggit, The ICU example is just a minor "example", we will not need to patch ICU, because they are releasing quite often. If you have reflection dependecies or dependencies from other JARs in you classpath, you can still install the original unpatched version and you are happy. They will not conflict.
          Hide
          Ryan McKinley added a comment -

          How close is commons-csv to a release?

          Show
          Ryan McKinley added a comment - How close is commons-csv to a release?
          Hide
          Emmanuel Bourg added a comment -

          Not close enough to satisfy Solr's pace of releases I think. Any help would be appreciated, the API has been reworked recently to take advantage of Java 5 features. A feedback on how it integrates with Solr would be welcome.

          Show
          Emmanuel Bourg added a comment - Not close enough to satisfy Solr's pace of releases I think. Any help would be appreciated, the API has been reworked recently to take advantage of Java 5 features. A feedback on how it integrates with Solr would be welcome.
          Hide
          Steve Rowe added a comment -

          I propose we jarjar only the currently used commons-csv snapshot, and leave all other jars as-is. (Uwe, you mentioned noggit, but it has never been released on its own, so I don't think it's necessary to jarjar it.)

          A feedback on how it integrates with Solr would be welcome.

          The 3.6 release is going to happen soon; it seems unwise to upgrade on branch_3x at this point.

          On trunk, however, upgrading the commons-csv snapshot to a more recent revision is a good idea, to prepare for the (eventual) release.

          Show
          Steve Rowe added a comment - I propose we jarjar only the currently used commons-csv snapshot, and leave all other jars as-is. (Uwe, you mentioned noggit, but it has never been released on its own, so I don't think it's necessary to jarjar it.) A feedback on how it integrates with Solr would be welcome. The 3.6 release is going to happen soon; it seems unwise to upgrade on branch_3x at this point. On trunk, however, upgrading the commons-csv snapshot to a more recent revision is a good idea, to prepare for the (eventual) release.
          Hide
          Ryan McKinley added a comment -

          Since we don't really deploy with maven, can we just point the maven artifact to the snapshot build?

          Show
          Ryan McKinley added a comment - Since we don't really deploy with maven, can we just point the maven artifact to the snapshot build?
          Hide
          Steve Rowe added a comment -

          Uwe, no need to even publish those bugfix jars. I would just inline them into the Solr artifact. (Think ueberjar) Then you don't even have to change a line of java source code nor the bugfix jars. Just the build.

          Torsten, in case you're wondering why nobody is answering you: Lucene/Solr publishes Maven artifacts that are identical to the released Ant-built binary jars, so your proposal would require increasing the size of our released jars, solely to solve a Maven dependency problem. Changing the Ant build simply to solve Maven problems has historically been a non-starter.

          Show
          Steve Rowe added a comment - Uwe, no need to even publish those bugfix jars. I would just inline them into the Solr artifact. (Think ueberjar) Then you don't even have to change a line of java source code nor the bugfix jars. Just the build. Torsten, in case you're wondering why nobody is answering you: Lucene/Solr publishes Maven artifacts that are identical to the released Ant-built binary jars, so your proposal would require increasing the size of our released jars, solely to solve a Maven dependency problem. Changing the Ant build simply to solve Maven problems has historically been a non-starter.
          Hide
          Emmanuel Bourg added a comment -

          This is not possible Ryan. But you might reference org.apache.solr:solr-commons-csv:3.5.0 in Solr 3.6 and stop publishing new artifacts to Maven central besides the solr/lucene specific ones. This will stop the proliferation of conflicting artifacts in the Maven repository, but it won't fix the classpath issue for projects using Solr and Commons CSV.

          Show
          Emmanuel Bourg added a comment - This is not possible Ryan. But you might reference org.apache.solr:solr-commons-csv:3.5.0 in Solr 3.6 and stop publishing new artifacts to Maven central besides the solr/lucene specific ones. This will stop the proliferation of conflicting artifacts in the Maven repository, but it won't fix the classpath issue for projects using Solr and Commons CSV.
          Hide
          Steve Rowe added a comment -

          Since we don't really deploy with maven, can we just point the maven artifact to the snapshot build?

          I don't know the answer, but I object to going down this road. There is a reason for not depending on snapshots: they change, often incompatibly. Having internal snapshots isn't just a sneaky workaround - it fixes the underlying problem (potentially incompatible changes), while also, of course, creating a different problem...

          We have an example in the specific case of commons-csv: as Emmanuel said, the API has changed (very likely incompatibly) since our internal snapshot was taken.

          Show
          Steve Rowe added a comment - Since we don't really deploy with maven, can we just point the maven artifact to the snapshot build? I don't know the answer, but I object to going down this road. There is a reason for not depending on snapshots: they change, often incompatibly. Having internal snapshots isn't just a sneaky workaround - it fixes the underlying problem (potentially incompatible changes), while also, of course, creating a different problem... We have an example in the specific case of commons-csv: as Emmanuel said, the API has changed (very likely incompatibly) since our internal snapshot was taken.
          Hide
          Steve Rowe added a comment -

          But you might reference org.apache.solr:solr-commons-csv:3.5.0 in Solr 3.6

          I think this is a good idea, and would require very little effort. I'll make a patch.

          Show
          Steve Rowe added a comment - But you might reference org.apache.solr:solr-commons-csv:3.5.0 in Solr 3.6 I think this is a good idea, and would require very little effort. I'll make a patch.
          Hide
          Emmanuel Bourg added a comment -

          Torsten, in case you're wondering why nobody is answering you: Lucene/Solr publishes Maven artifacts that are identical to the released Ant-built binary jars, so your proposal would require increasing the size of our released jars, solely to solve a Maven dependency problem. Changing the Ant build simply to solve Maven problems has historically been a non-starter.

          Steven, this will not increase the global size of the distribution, it might even reduce it. You are trading a core Solr jar with a few modified dependent jars, all downloaded from the central Maven repository, for a unique jar containing Solr and it dependencies, and possibly only the necessary classes. For example Solr doesn't use the writer package of Commons CSV, it can be automatically removed in the process to reduce the size of the dependencies.

          Still not interested?

          Show
          Emmanuel Bourg added a comment - Torsten, in case you're wondering why nobody is answering you: Lucene/Solr publishes Maven artifacts that are identical to the released Ant-built binary jars, so your proposal would require increasing the size of our released jars, solely to solve a Maven dependency problem. Changing the Ant build simply to solve Maven problems has historically been a non-starter. Steven, this will not increase the global size of the distribution, it might even reduce it. You are trading a core Solr jar with a few modified dependent jars, all downloaded from the central Maven repository, for a unique jar containing Solr and it dependencies, and possibly only the necessary classes. For example Solr doesn't use the writer package of Commons CSV, it can be automatically removed in the process to reduce the size of the dependencies. Still not interested?
          Hide
          Steve Rowe added a comment -

          Still not interested?

          I think you missed my point?: Changing the Ant build to fix a Maven problem is a non-starter.

          Show
          Steve Rowe added a comment - Still not interested? I think you missed my point?: Changing the Ant build to fix a Maven problem is a non-starter.
          Hide
          Ryan McKinley added a comment -

          But you might reference org.apache.solr:solr-commons-csv:3.5.0 in Solr 3.6

          +1

          Show
          Ryan McKinley added a comment - But you might reference org.apache.solr:solr-commons-csv:3.5.0 in Solr 3.6 +1
          Hide
          Emmanuel Bourg added a comment -

          Steven, you created this Maven problem by releasing conflicting artifacts to the central repository. Refusing to fix it is irresponsible.

          Show
          Emmanuel Bourg added a comment - Steven, you created this Maven problem by releasing conflicting artifacts to the central repository. Refusing to fix it is irresponsible.
          Hide
          Steve Rowe added a comment -

          Steven, you created this Maven problem by releasing conflicting artifacts to the central repository.

          In fact, I am personally responsible for putting the conflicting artifacts in the central repository. It is also largely as a result of my efforts that Lucene/Solr continues to publish Maven artifacts at all.

          Refusing to fix it is irresponsible.

          Only if one assumes that refusing to implement your proposed fix is equivalent to refusing to do anything? I suspect we disagree on this point.

          Show
          Steve Rowe added a comment - Steven, you created this Maven problem by releasing conflicting artifacts to the central repository. In fact, I am personally responsible for putting the conflicting artifacts in the central repository. It is also largely as a result of my efforts that Lucene/Solr continues to publish Maven artifacts at all. Refusing to fix it is irresponsible. Only if one assumes that refusing to implement your proposed fix is equivalent to refusing to do anything? I suspect we disagree on this point.
          Hide
          Ryan McKinley added a comment -

          Steven, you created this Maven problem

          Lets keep our heads cool here. To be clear, Steven wants to help and is doing what he can. We need to figure a way to help within the limits of the lucene community. Previous discussion in this thread should give you a sense of the animosity towards maven – the question is how to move forward within these limits.

          The obvious best solution is to point to an official commons-csv release.

          Short of that, we need to make sure we don't make the problem worse; We can do that by releasing 3.6 pointing to the 3.5 solr-commons-csv package.

          Show
          Ryan McKinley added a comment - Steven, you created this Maven problem Lets keep our heads cool here. To be clear, Steven wants to help and is doing what he can. We need to figure a way to help within the limits of the lucene community. Previous discussion in this thread should give you a sense of the animosity towards maven – the question is how to move forward within these limits. The obvious best solution is to point to an official commons-csv release. Short of that, we need to make sure we don't make the problem worse; We can do that by releasing 3.6 pointing to the 3.5 solr-commons-csv package.
          Hide
          Robert Muir added a comment -

          Previous discussion in this thread should give you a sense of the animosity towards maven – the question is how to move forward within these limits.

          Its not unjustified animosity towards maven: this issue is solely maven

          Show
          Robert Muir added a comment - Previous discussion in this thread should give you a sense of the animosity towards maven – the question is how to move forward within these limits. Its not unjustified animosity towards maven: this issue is solely maven
          Hide
          Emmanuel Bourg added a comment - - edited

          Steven I wasn't referring to you explicitly, but to the Solr community. I didn't know you actually deployed the artifacts. If you dislike Maven you'll be pleased to learn that it was a great blow to the Maven integrity

          I'm glad to hear you are willing to do something, what's your plan? Let me recap the options currently identified:

          1. Embed the Commons CSV source files in the core of Solr. That was the purpose of my patch, but it doesn't solve the issue for the other dependencies (carrot, noggit, uima, jsonic, etc)
          2. Embed the pacthed dependencies in the main Solr jar and rename the packages with jarjar or a similar tool. The additional benefit it a reduction of the global distribution size by keeping only the classes actually used by Solr.
          3. Keep publishing Solr specific dependencies but after renaming the packages with JarJar
          4. Import the source of the dependencies and rename the package with the Ant replace task (Tomcat solution)
          5. Stop releasing Maven artifacts completely, even Solr core.

          Pointing to the 3.5 artifacts is a temporary solution, because:

          • the classpath conflicts still exists for projects importing solr-core
          • the next time Solr needs a patched unreleased dependency it will be problematic
          Show
          Emmanuel Bourg added a comment - - edited Steven I wasn't referring to you explicitly, but to the Solr community. I didn't know you actually deployed the artifacts. If you dislike Maven you'll be pleased to learn that it was a great blow to the Maven integrity I'm glad to hear you are willing to do something, what's your plan? Let me recap the options currently identified: Embed the Commons CSV source files in the core of Solr. That was the purpose of my patch, but it doesn't solve the issue for the other dependencies (carrot, noggit, uima, jsonic, etc) Embed the pacthed dependencies in the main Solr jar and rename the packages with jarjar or a similar tool. The additional benefit it a reduction of the global distribution size by keeping only the classes actually used by Solr. Keep publishing Solr specific dependencies but after renaming the packages with JarJar Import the source of the dependencies and rename the package with the Ant replace task (Tomcat solution) Stop releasing Maven artifacts completely, even Solr core. Pointing to the 3.5 artifacts is a temporary solution, because: the classpath conflicts still exists for projects importing solr-core the next time Solr needs a patched unreleased dependency it will be problematic
          Hide
          Michael McCandless added a comment -

          To be clear, Steven wants to help and is doing what he can.

          I would make this statement far stronger:

          Without all of Steven's hard work struggling with Maven, while working
          within the constraints of the other Lucene/Solr committers (who
          generally, strongly, dislike the impositions Maven places on a
          project's development, case in point), there would be no Maven
          artifacts for Lucene nor Solr.

          Seriously, Steven is the ONLY reason why Lucene and Solr have, and
          continue to have, Maven artifacts.

          Thank you Steven!!

          Show
          Michael McCandless added a comment - To be clear, Steven wants to help and is doing what he can. I would make this statement far stronger: Without all of Steven's hard work struggling with Maven, while working within the constraints of the other Lucene/Solr committers (who generally, strongly, dislike the impositions Maven places on a project's development, case in point), there would be no Maven artifacts for Lucene nor Solr. Seriously, Steven is the ONLY reason why Lucene and Solr have, and continue to have, Maven artifacts. Thank you Steven!!
          Hide
          Steve Rowe added a comment -

          Patch switching solr-commons-csv Maven dependency to v3.5.0.

          I will commit in a couple of hours if there are no objections.

          Show
          Steve Rowe added a comment - Patch switching solr-commons-csv Maven dependency to v3.5.0. I will commit in a couple of hours if there are no objections.
          Hide
          Steve Rowe added a comment -

          Pointing to the 3.5 artifacts is a temporary solution

          I agree. I also agree with Ryan: "[W]e need to make sure we don't make the problem worse; We can do that by releasing 3.6 pointing to the 3.5 solr-commons-csv package."

          If you dislike Maven you'll be pleased to learn that it was a great blow to the Maven integrity

          I actually like Maven for some things.

          "Great blow"? I sincerely doubt that this is the first time something like this has happened.

          "Maven integrity"? Oxymoronic much?

          Show
          Steve Rowe added a comment - Pointing to the 3.5 artifacts is a temporary solution I agree. I also agree with Ryan: " [W] e need to make sure we don't make the problem worse; We can do that by releasing 3.6 pointing to the 3.5 solr-commons-csv package." If you dislike Maven you'll be pleased to learn that it was a great blow to the Maven integrity I actually like Maven for some things. "Great blow"? I sincerely doubt that this is the first time something like this has happened. "Maven integrity"? Oxymoronic much?
          Hide
          Steve Rowe added a comment -

          I'm glad to hear you are willing to do something, what's your plan?

          I don't have one yet. What's your plan for releasing commons-csv?

          Let me recap the options currently identified:

          1. Embed the Commons CSV source files in the core of Solr. That was the purpose of my patch, but it doesn't solve the issue for the other dependencies (carrot, noggit, uima, jsonic, etc)

          -1. I think this is too intrusive. Maven's requirement cannot dictate Lucene/Solr Ant build structure.

          2. Embed the pacthed dependencies in the main Solr jar and rename the packages with jarjar or a similar tool. The additional benefit it a reduction of the global distribution size by keeping only the classes actually used by Solr.

          -1. Again, too intrusive.

          3. Keep publishing Solr specific dependencies but after renaming the packages with JarJar

          I think this is a possibility.

          4. Import the source of the dependencies and rename the package with the Ant replace task (Tomcat solution)

          -0. The project would require a new module for each dependency. Seems too heavy to me.

          5. Stop releasing Maven artifacts completely, even Solr core.

          Fantastic! Thanks for your input! You're, like, awesome!

          Show
          Steve Rowe added a comment - I'm glad to hear you are willing to do something, what's your plan? I don't have one yet. What's your plan for releasing commons-csv? Let me recap the options currently identified: 1. Embed the Commons CSV source files in the core of Solr. That was the purpose of my patch, but it doesn't solve the issue for the other dependencies (carrot, noggit, uima, jsonic, etc) -1. I think this is too intrusive. Maven's requirement cannot dictate Lucene/Solr Ant build structure. 2. Embed the pacthed dependencies in the main Solr jar and rename the packages with jarjar or a similar tool. The additional benefit it a reduction of the global distribution size by keeping only the classes actually used by Solr. -1. Again, too intrusive. 3. Keep publishing Solr specific dependencies but after renaming the packages with JarJar I think this is a possibility. 4. Import the source of the dependencies and rename the package with the Ant replace task (Tomcat solution) -0. The project would require a new module for each dependency. Seems too heavy to me. 5. Stop releasing Maven artifacts completely, even Solr core. Fantastic! Thanks for your input! You're, like, awesome!
          Hide
          Steve Rowe added a comment -

          Emmanuel, you neglected to include transitive exclusion in your "identified options":

          6. Ryan suggested excluding solr-commons-csv from solr-core's transitive dependencies: https://issues.apache.org/jira/browse/SOLR-3204?focusedCommentId=13222930&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13222930

          This puts the burden on users, who won't know it's necessary until after they run into classpath trouble. We should figure out something better.

          Show
          Steve Rowe added a comment - Emmanuel, you neglected to include transitive exclusion in your "identified options": 6. Ryan suggested excluding solr-commons-csv from solr-core's transitive dependencies: https://issues.apache.org/jira/browse/SOLR-3204?focusedCommentId=13222930&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13222930 This puts the burden on users, who won't know it's necessary until after they run into classpath trouble. We should figure out something better.
          Hide
          Robert Muir added a comment -

          We should figure out something better.

          Not knowing anything about maven, but just comparing this to the releases that solr users download, the main
          difference seems to be that in general the 'artifact' for an ordinary user is a .war file, a server-side
          webapp that a user places into their servlet container... maybe the maven artifact should be consistent with that?

          Wouldn't this avoid exposing any internal implementation details of whats actually inside that .war file,
          such as any external dependencies?

          Show
          Robert Muir added a comment - We should figure out something better. Not knowing anything about maven, but just comparing this to the releases that solr users download, the main difference seems to be that in general the 'artifact' for an ordinary user is a .war file, a server-side webapp that a user places into their servlet container... maybe the maven artifact should be consistent with that? Wouldn't this avoid exposing any internal implementation details of whats actually inside that .war file, such as any external dependencies?
          Hide
          Emmanuel Bourg added a comment - - edited

          @Steven Yes that's a possible workaround for Solr users, as long as they don't touch directly or indirectly the CSV related classes. But that solution doesn't spare a necessary modification of the Solr build with regard to patched dependencies.

          Show
          Emmanuel Bourg added a comment - - edited @Steven Yes that's a possible workaround for Solr users, as long as they don't touch directly or indirectly the CSV related classes. But that solution doesn't spare a necessary modification of the Solr build with regard to patched dependencies.
          Hide
          Steve Rowe added a comment - - edited

          Solr is a end-user product, so I have no idea why it is in maven at all.

          the 'artifact' for an ordinary user is a .war file, a server-side webapp that a user places into their servlet container... maybe the maven artifact should be consistent with that? Wouldn't this avoid exposing any internal implementation details of whats actually inside that .war file, such as any external dependencies?

          Solr can be used in embedded fashion (though I don't have any experience with that).

          Also, solrj is a client jar.

          edit: oops, solrj isn't relevant here - we're talking about solr-core.

          Show
          Steve Rowe added a comment - - edited Solr is a end-user product, so I have no idea why it is in maven at all. the 'artifact' for an ordinary user is a .war file, a server-side webapp that a user places into their servlet container... maybe the maven artifact should be consistent with that? Wouldn't this avoid exposing any internal implementation details of whats actually inside that .war file, such as any external dependencies? Solr can be used in embedded fashion (though I don't have any experience with that). Also, solrj is a client jar. edit : oops, solrj isn't relevant here - we're talking about solr-core.
          Hide
          Ryan McKinley added a comment -

          The real problem (regardless of ant/maven) is for users that want to use o.a.c.csv classes that are not the ones in the solr distribution. This can happen when people have plugins using something based on a recent CSV build – something will break, either their code, or the CSVResponseWriter

          This is not a maven issue, it is a question of who should put classes into the org.apache.commons.csv namespace. I understand why the csv folks are pissed, and think it is fair we look for ways to fix it.

          I am now +1 on Emmanuel's patch to embed o.a.solr.csv in the solr core jar file – we should have notes that it won't be supported once a real csv is released.

          Show
          Ryan McKinley added a comment - The real problem (regardless of ant/maven) is for users that want to use o.a.c.csv classes that are not the ones in the solr distribution. This can happen when people have plugins using something based on a recent CSV build – something will break, either their code, or the CSVResponseWriter This is not a maven issue, it is a question of who should put classes into the org.apache.commons.csv namespace. I understand why the csv folks are pissed, and think it is fair we look for ways to fix it. I am now +1 on Emmanuel's patch to embed o.a.solr.csv in the solr core jar file – we should have notes that it won't be supported once a real csv is released.
          Hide
          Robert Muir added a comment -

          This is not a maven issue, it is a question of who should put classes into the org.apache.commons.csv namespace.

          This is a 100% maven and only maven issue.

          For the ant build we are just using a snapshot jar in solr/lib, not 'releasing' nor 'putting classes' anywhere.
          Its an implementation detail of solr. Its simply a dependency.

          Show
          Robert Muir added a comment - This is not a maven issue, it is a question of who should put classes into the org.apache.commons.csv namespace. This is a 100% maven and only maven issue. For the ant build we are just using a snapshot jar in solr/lib, not 'releasing' nor 'putting classes' anywhere. Its an implementation detail of solr. Its simply a dependency.
          Hide
          Ryan McKinley added a comment -

          Sorry, by 'put classes', i mean that when commons-csv-1.0-SNAPSHOT-r966014.jar is in the runtime classpath it uses classes in the namespace org.apache.commons.csv

          By using this namespace, it prevents people from using other dependencies with that same package name reliably. The commons-csv folks are understandably upset that that means their jar file does not work as expected within solr.

          Consider the case where someone deploys the solr.war file (built by ant) with a plugin that uses the current commons-csv.jar from /trunk. something will break – either the CSVRequestHandler or their plugin depending on what is first in the classpath. (I think their plugin would always fail since it is loaded second)

          I think we have gotten lost in the maven discussion, when the real problem is about how our CSV.jar interacts with plugins (or anything else in the runtime classpath)

          Show
          Ryan McKinley added a comment - Sorry, by 'put classes', i mean that when commons-csv-1.0-SNAPSHOT-r966014.jar is in the runtime classpath it uses classes in the namespace org.apache.commons.csv By using this namespace, it prevents people from using other dependencies with that same package name reliably. The commons-csv folks are understandably upset that that means their jar file does not work as expected within solr. Consider the case where someone deploys the solr.war file (built by ant) with a plugin that uses the current commons-csv.jar from /trunk. something will break – either the CSVRequestHandler or their plugin depending on what is first in the classpath. (I think their plugin would always fail since it is loaded second) I think we have gotten lost in the maven discussion, when the real problem is about how our CSV.jar interacts with plugins (or anything else in the runtime classpath)
          Hide
          Robert Muir added a comment -

          This issue isnt about plugins, having commons-csv in the classpath only affect users of solr.

          Please don't try to spin the issue, this issue is all about maven and only about maven:

          • the very title of this jira issue 'solr-commons-csv' is a maven-only concept
          • as stated by Emmanuel in his first response back 'Even if you consider solr-commons-csv to be an internal part of Solr it's now a public Maven artifact that people are starting to use in their projects'

          So thats the only issue, maven artifacts.

          Show
          Robert Muir added a comment - This issue isnt about plugins, having commons-csv in the classpath only affect users of solr . Please don't try to spin the issue, this issue is all about maven and only about maven: the very title of this jira issue 'solr-commons-csv' is a maven-only concept as stated by Emmanuel in his first response back 'Even if you consider solr-commons-csv to be an internal part of Solr it's now a public Maven artifact that people are starting to use in their projects' So thats the only issue, maven artifacts.
          Hide
          Ryan McKinley added a comment -

          I'm not trying to spin anything – I'm trying to figure out what the real issue is. Part of it is obviously the maven release. Emmanuel can you tell us more about how this came up?

          Show
          Ryan McKinley added a comment - I'm not trying to spin anything – I'm trying to figure out what the real issue is. Part of it is obviously the maven release. Emmanuel can you tell us more about how this came up?
          Hide
          Emmanuel Bourg added a comment -

          Your analysis is correct Ryan. Anyone building upon the Solr code is affected by this issue. The fact the offending artifacts are in the central Maven repository only makes the problem worse,

          I discovered this issue because someone who imported solr-commons-csv in its project reported an issue (SANDBOX-330), an infinite loop with trailing comments, that was already fixed on the trunk. I didn't know about the solr-commons-csv artifact in the central Maven repository until then. So I inspected the jar and gasped in awe at the unmodified namespace.

          I understand that Maven as a build tool is controversial, I'm myself an heavy Ant user on several projects requiring complex build setups. But Maven, as a way to share code through a central repository, is also a very valuable tool. It's adopted by alternative build tools like Ivy and Gradle. It's also used by IDEs to import dependencies automatically and to be able to browse their source and Javadoc easily. It's difficult to understand why this part is also hated by Maven detractors. The constraints on the central repository aren't that unbearable, the most important rule to follow is to not publish two artifacts identified by distinct groupId:artifactId with conflicting classes. The corollary is that a project A should not publish classes in the namespace of a project B.

          Show
          Emmanuel Bourg added a comment - Your analysis is correct Ryan. Anyone building upon the Solr code is affected by this issue. The fact the offending artifacts are in the central Maven repository only makes the problem worse, I discovered this issue because someone who imported solr-commons-csv in its project reported an issue ( SANDBOX-330 ), an infinite loop with trailing comments, that was already fixed on the trunk. I didn't know about the solr-commons-csv artifact in the central Maven repository until then. So I inspected the jar and gasped in awe at the unmodified namespace. I understand that Maven as a build tool is controversial, I'm myself an heavy Ant user on several projects requiring complex build setups. But Maven, as a way to share code through a central repository, is also a very valuable tool. It's adopted by alternative build tools like Ivy and Gradle. It's also used by IDEs to import dependencies automatically and to be able to browse their source and Javadoc easily. It's difficult to understand why this part is also hated by Maven detractors. The constraints on the central repository aren't that unbearable, the most important rule to follow is to not publish two artifacts identified by distinct groupId:artifactId with conflicting classes. The corollary is that a project A should not publish classes in the namespace of a project B.
          Hide
          Robert Muir added a comment -

          The arguments that this is more than a maven issue are bogus.
          Do you take me for a fool?

          The same arguments could be had for even released dependencies like httpclient-3.0 that we use. What if a user wants to use 4.0?

          To ant, whether its released or unreleased, its just a jar in a /lib and in the users classpath.
          But thats not the real issue here: These people are angry about maven repositories.

          So quit screwing around with crazy ideas about duplicating whole codebases and fix the real problem:

          open a bug report at Maven with the Description:

          Allow maven project A to depend upon project B without B being in maven at all

          Thats the bug, it stops the replication of the virus, and i don't see any reason why maven wouldnt want to fix it? If they object, then it only proves my point that its really the GPLv3 of build tools.

          Show
          Robert Muir added a comment - The arguments that this is more than a maven issue are bogus. Do you take me for a fool? The same arguments could be had for even released dependencies like httpclient-3.0 that we use. What if a user wants to use 4.0? To ant, whether its released or unreleased, its just a jar in a /lib and in the users classpath. But thats not the real issue here: These people are angry about maven repositories. So quit screwing around with crazy ideas about duplicating whole codebases and fix the real problem: open a bug report at Maven with the Description: Allow maven project A to depend upon project B without B being in maven at all Thats the bug, it stops the replication of the virus, and i don't see any reason why maven wouldnt want to fix it? If they object, then it only proves my point that its really the GPLv3 of build tools.
          Hide
          Greg Stein added a comment -

          Maven is a red herring.

          The short answer is that the Lucene PMC is releasing artifacts (to the maven repository) in the org.apache.commons namespace.

          It doesn't matter how the release is done. You're doing it. You have released Commons code to the public. Period. Stop pointing fingers at tools. The Lucene PMC has released code in another project's namespace, which is incompatible with the direction that project wants to take the namespace. The Lucene PMC has no right to usurp the direction of other projects here at the ASF.

          If you cannot get the tool to avoid releasing/interfering with other projects' code, then stop using the tool. Seriously.

          The Lucene PMC's releases should stop interfering with other projects' namespaces. Find a way to fix it. Stop blaming tools. Start fixing.

          Show
          Greg Stein added a comment - Maven is a red herring. The short answer is that the Lucene PMC is releasing artifacts (to the maven repository) in the org.apache.commons namespace. It doesn't matter how the release is done. You're doing it. You have released Commons code to the public. Period. Stop pointing fingers at tools. The Lucene PMC has released code in another project's namespace, which is incompatible with the direction that project wants to take the namespace. The Lucene PMC has no right to usurp the direction of other projects here at the ASF. If you cannot get the tool to avoid releasing/interfering with other projects' code, then stop using the tool. Seriously. The Lucene PMC's releases should stop interfering with other projects' namespaces. Find a way to fix it. Stop blaming tools. Start fixing.
          Hide
          Emmanuel Bourg added a comment -

          HTTClient 4 can coexist with HTTPClient 3 in the same classpath, because they are not in the same package.

          Show
          Emmanuel Bourg added a comment - HTTClient 4 can coexist with HTTPClient 3 in the same classpath, because they are not in the same package.
          Hide
          Grant Ingersoll added a comment - - edited

          I guess I'm missing something. We aren't part of the Commons project. The Commons code is Apache licensed, we are using it as we see fit under the T&C of the ASL. What prevents someone else from going and doing the same thing outside of Apache? In fact, it happens all the time, even at Apache. Commons CLI 2 is the biggest one, as I know of several projects that "release" that b/c Commons itself never has. Here are also several who do it for Lucene. The problem is Maven and it isn't going away.

          Here are some examples:

          1. http://search.maven.org/#artifactdetails%7Corg.hibernate.apache.lucene.solr%7Capache-solr-analyzer%7C1.2.0%7Cjar
          2. http://search.maven.org/#artifactdetails%7Corg.apache.clerezza.ext%7Corg.apache.lucene%7C0.1-incubating%7Cbundle

          Tomcat does the same thing to Commons (file upload, for example):

          1. http://search.maven.org/#search%7Cga%7C4%7Ccommons

          I could keep going. Just do searches for popular Apache libraries in Maven and you will see what I mean. Maven is not a red herring. It is the problem.

          Show
          Grant Ingersoll added a comment - - edited I guess I'm missing something. We aren't part of the Commons project. The Commons code is Apache licensed, we are using it as we see fit under the T&C of the ASL. What prevents someone else from going and doing the same thing outside of Apache? In fact, it happens all the time, even at Apache. Commons CLI 2 is the biggest one, as I know of several projects that "release" that b/c Commons itself never has. Here are also several who do it for Lucene. The problem is Maven and it isn't going away. Here are some examples: http://search.maven.org/#artifactdetails%7Corg.hibernate.apache.lucene.solr%7Capache-solr-analyzer%7C1.2.0%7Cjar http://search.maven.org/#artifactdetails%7Corg.apache.clerezza.ext%7Corg.apache.lucene%7C0.1-incubating%7Cbundle Tomcat does the same thing to Commons (file upload, for example): http://search.maven.org/#search%7Cga%7C4%7Ccommons I could keep going. Just do searches for popular Apache libraries in Maven and you will see what I mean. Maven is not a red herring. It is the problem.
          Hide
          Emmanuel Bourg added a comment - - edited

          That's not correct for Tomcat, the Commons components in Tomcat have a different namespace to avoid classpath conflicts.

          Clerezza releases will have to be changed too.

          Hibernate probably made a mistake in 2008, they haven't released a solr artifact since then, so I guess they fixed the issue.

          EDIT: Indeed Hibernate fixed the dependency in Hibernate Search 3.3.0 Beta2, the dependency was introduced with the Beta 1

          Show
          Emmanuel Bourg added a comment - - edited That's not correct for Tomcat, the Commons components in Tomcat have a different namespace to avoid classpath conflicts. Clerezza releases will have to be changed too. Hibernate probably made a mistake in 2008, they haven't released a solr artifact since then, so I guess they fixed the issue. EDIT: Indeed Hibernate fixed the dependency in Hibernate Search 3.3.0 Beta2, the dependency was introduced with the Beta 1
          Hide
          Grant Ingersoll added a comment - - edited

          More that do it:
          Geronimo (Apache) (Digester, Discovery)
          JVNet-Hudson (commons-jelly)
          com.kenai.nbpwr
          ServiceMix (commons-csv)
          Apache Directory does this weird thing where they package the Commons jars inside of another jar. Not sure what that does, but the nested jar is Commons. Not quite sure how the classloader would deal with that.
          Mahout does it for Commons-CLI2 (b/c commons never seems to want to release it)

          And that is just me searching for commons.

          BTW, I don't actually think the problem is Maven. The problem is Java's classloader is broken. The way to fix it is to use a classloader that scopes dependencies properly. Otherwise, this is simply a known problem with Java and the solution is that one has to be careful about what versions of things one imports. It's exactly the same issue as when one has two dependencies that each have dependencies on different versions of commons-lang or Lucene or whats-it-called.jar.

          In other words, I don't think this is worth us doing anything different than what we are already doing. Except, for seeing if we can upgrade our version of Commons CSV to an official release as a courtesy.

          Show
          Grant Ingersoll added a comment - - edited More that do it: Geronimo (Apache) (Digester, Discovery) JVNet-Hudson (commons-jelly) com.kenai.nbpwr ServiceMix (commons-csv) Apache Directory does this weird thing where they package the Commons jars inside of another jar. Not sure what that does, but the nested jar is Commons. Not quite sure how the classloader would deal with that. Mahout does it for Commons-CLI2 (b/c commons never seems to want to release it) And that is just me searching for commons. BTW, I don't actually think the problem is Maven. The problem is Java's classloader is broken. The way to fix it is to use a classloader that scopes dependencies properly. Otherwise, this is simply a known problem with Java and the solution is that one has to be careful about what versions of things one imports. It's exactly the same issue as when one has two dependencies that each have dependencies on different versions of commons-lang or Lucene or whats-it-called.jar. In other words, I don't think this is worth us doing anything different than what we are already doing. Except, for seeing if we can upgrade our version of Commons CSV to an official release as a courtesy.
          Hide
          Michael McCandless added a comment -

          There are two very different issues mixed in here:

          Issue 1: a private, internal dependency in Solr on commons-csv (and
          others) has become "publicly released" in Maven as solr-commons-csv.
          I know very little about Maven, but this does sound bad. That said, I
          don't think Maven has any right to "peek in" and force Solr's private
          dependencies to become public; what deps Solr uses is an
          implementation detail of Solr. Short of removing Solr from Maven, is
          there some way to publish Solr's WAR through Maven while keeping (some
          of?) its dependencies private?

          Issue 2: JAR hell (runtime CLASSPATH pollution): this is an age old,
          widespread issue, completely orthogonal to Maven. Regardless of how a
          project builds its artifacts, if it has external dependencies, it will
          of course "pollute" the CLASSPATH. This is a well known problem.

          Really, we should open a new issue for issue 2: it has nothing to do
          with Maven and is long-standing (our ant-produced artifacts have had
          the issue forever). In practice, this has never seemed to be a
          problem to users, presumably because users don't usually embed Solr.
          Normally users deploy the WAR and use as-is, except for plugins. If
          we really think this needs to be fixed (I'm doubtful), there are
          various options... eg OSGI seems to "solve" this, but adds complexity.
          Jarjar, minijar, ant replace, are also options, but I don't like the
          risk of hard-to-explain bugs. We can discuss on a separate issue...

          Show
          Michael McCandless added a comment - There are two very different issues mixed in here: Issue 1: a private, internal dependency in Solr on commons-csv (and others) has become "publicly released" in Maven as solr-commons-csv. I know very little about Maven, but this does sound bad. That said, I don't think Maven has any right to "peek in" and force Solr's private dependencies to become public; what deps Solr uses is an implementation detail of Solr. Short of removing Solr from Maven, is there some way to publish Solr's WAR through Maven while keeping (some of?) its dependencies private? Issue 2: JAR hell (runtime CLASSPATH pollution): this is an age old, widespread issue, completely orthogonal to Maven. Regardless of how a project builds its artifacts, if it has external dependencies, it will of course "pollute" the CLASSPATH. This is a well known problem. Really, we should open a new issue for issue 2: it has nothing to do with Maven and is long-standing (our ant-produced artifacts have had the issue forever). In practice, this has never seemed to be a problem to users, presumably because users don't usually embed Solr. Normally users deploy the WAR and use as-is, except for plugins. If we really think this needs to be fixed (I'm doubtful), there are various options... eg OSGI seems to "solve" this, but adds complexity. Jarjar, minijar, ant replace, are also options, but I don't like the risk of hard-to-explain bugs. We can discuss on a separate issue...
          Hide
          Grant Ingersoll added a comment -

          That's not correct for Tomcat, the Commons components in Tomcat have a different namespace to avoid classpath conflicts.

          Not in the ones I downloaded.

          Show
          Grant Ingersoll added a comment - That's not correct for Tomcat, the Commons components in Tomcat have a different namespace to avoid classpath conflicts. Not in the ones I downloaded.
          Show
          Emmanuel Bourg added a comment - Grant, get the latest ones: http://www.jarvana.com/jarvana/inspect/org/apache/tomcat/tomcat-dbcp/7.0.23/tomcat-dbcp-7.0.23.jar
          Hide
          Mike Sokolov added a comment - - edited

          There is no issue for users of the WAR, I think, which will generally be external to a project, and handles all of these bundling issues internally.

          The issue arises in projects that depend on the various solr jars that are bundled in its WAR. Using Maven, it's as if you exploded the WAR, pulled all its jars into your project, and then added your own code too. As an example, we've done this in a project where I work so we can run EmbeddedSolrServer in our tests. I have it on my TODO list to break this dependency though since it's overkill to include all of solr in our project for this convenience.

          Show
          Mike Sokolov added a comment - - edited There is no issue for users of the WAR, I think, which will generally be external to a project, and handles all of these bundling issues internally. The issue arises in projects that depend on the various solr jars that are bundled in its WAR. Using Maven, it's as if you exploded the WAR, pulled all its jars into your project, and then added your own code too. As an example, we've done this in a project where I work so we can run EmbeddedSolrServer in our tests. I have it on my TODO list to break this dependency though since it's overkill to include all of solr in our project for this convenience.
          Hide
          Uwe Schindler added a comment -

          As noted before, we cannot fix this for Jetty 6 by package-renaming using JarJar, as this would break almost everything in the jetty configuration (Jetty is JarJar unfriendly), but Jetty can be easily solved on our side by upgrading to Jetty 8 (also for Solr 3.6!!!!). The patch/branch is ready, just needs committing. We can then ship with unmodified Jetty 8, as the Unicode bugs should be fixed.

          For the other projects I see no real problem in using jarjar (which is used in lots of projects): Lets simply investigate what foreign namespaces we use in our packages (e.g. we should also rename the Snowball classes, because we heavily forked them and they are still in org.tartarus namespace - as we forked that project, we should do it in the source code). For e.g. commons-csv we have no problem jarjaring our fork and re-release it. In the jarjar issue tracker is also a discussion about “hiding” renamed classes from GUIs by prefixing all renamed class names with $, so they are invisible for an autocomplete-using Eclipse developer, so nobody outside can use the jarjared classes outside Lucene/Solr.

          About the remaining issues Robert mentioned: They are all solvable. JarJar renames also class names inside strings used for reflection, it can also rewrite Manifests and META-INF (we use e.g. for codecs, or Apache Xerces uses to plug in the parser impl). I just would like to give it a try, it cannot happen more than our testcases do not run. The example Robert has given with external references to icu4j is exactly what we want to prevent with JarJaring, so it doesn’t apply in my opinion. If we patch and modify icu4j and ship with a package-renamed version, code that uses reflection outside the Solr/Lucene JAR is completely unaffected. If the external project side-by-side with Lucene/Solr wants to use icu4j it can simply include the original in any version, solr with its patched version is completely unaffected and vice versa.

          So I would strongly suggest to jarjar all dependencies that we modified locally (means all foreign jar-files with solr-prefix). How we do this is a different issue: We can only rename all classes in that jar-file and republish it (but then we have to change our import statements in our code – I would prefer this), or we jarjar the whole Solr distribution after the complete build.

          Show
          Uwe Schindler added a comment - As noted before, we cannot fix this for Jetty 6 by package-renaming using JarJar, as this would break almost everything in the jetty configuration (Jetty is JarJar unfriendly), but Jetty can be easily solved on our side by upgrading to Jetty 8 (also for Solr 3.6!!!!). The patch/branch is ready, just needs committing. We can then ship with unmodified Jetty 8, as the Unicode bugs should be fixed. For the other projects I see no real problem in using jarjar (which is used in lots of projects): Lets simply investigate what foreign namespaces we use in our packages (e.g. we should also rename the Snowball classes, because we heavily forked them and they are still in org.tartarus namespace - as we forked that project, we should do it in the source code). For e.g. commons-csv we have no problem jarjaring our fork and re-release it. In the jarjar issue tracker is also a discussion about “hiding” renamed classes from GUIs by prefixing all renamed class names with $, so they are invisible for an autocomplete-using Eclipse developer, so nobody outside can use the jarjared classes outside Lucene/Solr. About the remaining issues Robert mentioned: They are all solvable. JarJar renames also class names inside strings used for reflection, it can also rewrite Manifests and META-INF (we use e.g. for codecs, or Apache Xerces uses to plug in the parser impl). I just would like to give it a try, it cannot happen more than our testcases do not run. The example Robert has given with external references to icu4j is exactly what we want to prevent with JarJaring, so it doesn’t apply in my opinion. If we patch and modify icu4j and ship with a package-renamed version, code that uses reflection outside the Solr/Lucene JAR is completely unaffected. If the external project side-by-side with Lucene/Solr wants to use icu4j it can simply include the original in any version, solr with its patched version is completely unaffected and vice versa. So I would strongly suggest to jarjar all dependencies that we modified locally (means all foreign jar-files with solr-prefix). How we do this is a different issue: We can only rename all classes in that jar-file and republish it (but then we have to change our import statements in our code – I would prefer this), or we jarjar the whole Solr distribution after the complete build.
          Hide
          Robert Muir added a comment -

          How would this work for e.g. the carrot2 example? The carrot2 class names are referenced in user configuration files (solrconfig.xml):

          For example:

              <lst name="engine">
                <!-- The name, only one can be named "default" -->
                <str name="name">default</str>
                <str name="carrot.algorithm">org.carrot2.clustering.lingo.LingoClusteringAlgorithm</str>
              </lst>
          
          Show
          Robert Muir added a comment - How would this work for e.g. the carrot2 example? The carrot2 class names are referenced in user configuration files (solrconfig.xml): For example: <lst name="engine"> <!-- The name, only one can be named "default" --> <str name="name">default</str> <str name="carrot.algorithm">org.carrot2.clustering.lingo.LingoClusteringAlgorithm</str> </lst>
          Hide
          Robert Muir added a comment -

          Also the UIMA integration has the same issue, configuration files contain their qualified class names, for example:

          <type allAnnotatorFeatures="true">org.apache.uima.SentenceAnnotation</type>
          
          Show
          Robert Muir added a comment - Also the UIMA integration has the same issue, configuration files contain their qualified class names, for example: <type allAnnotatorFeatures="true">org.apache.uima.SentenceAnnotation</type>
          Hide
          Emmanuel Bourg added a comment -

          How we do this is a different issue: We can only rename all classes in that jar-file and republish it (but then we have to change our import statements in our code – I would prefer this), or we jarjar the whole Solr distribution after the complete build.

          It's preferable to post process the Solr jar rather than renaming the packages imported in the source code. That's less intrusive and let you upgrade to the official fixed dependency more easily.

          Show
          Emmanuel Bourg added a comment - How we do this is a different issue: We can only rename all classes in that jar-file and republish it (but then we have to change our import statements in our code – I would prefer this), or we jarjar the whole Solr distribution after the complete build. It's preferable to post process the Solr jar rather than renaming the packages imported in the source code. That's less intrusive and let you upgrade to the official fixed dependency more easily.
          Hide
          Mark Miller added a comment -

          As Mike and Yonik mentioned - most of us don't know much about maven. I still wish it was a downstream issue. I still wish we would have voted on supporting it. With all due respect to Steve and his great work around it. I really wish it was not our problem - as I know many others do.

          What was the issue with just using a released version of csv? This issue is so long already, its hard to find it.

          And -1 for continuing Maven support within the project for what its worth (Though I love your work and dedication to this Steve )

          Show
          Mark Miller added a comment - As Mike and Yonik mentioned - most of us don't know much about maven. I still wish it was a downstream issue. I still wish we would have voted on supporting it. With all due respect to Steve and his great work around it. I really wish it was not our problem - as I know many others do. What was the issue with just using a released version of csv? This issue is so long already, its hard to find it. And -1 for continuing Maven support within the project for what its worth (Though I love your work and dedication to this Steve )
          Hide
          Robert Muir added a comment -

          What was the issue with just using a released version of csv? This issue is so long already, its hard to find it.

          I think the major issue is that there is not yet a released version. I think we would definitely prefer that.

          Speaking from some experience: In my opinion CSV is deceptively complex and is a perfect example of where we
          would rather 'downstream' that knowledge to someone who focuses on it and gives good performance.
          Rolling our own is dangerous.

          But there is not yet any official releases to use...

          Show
          Robert Muir added a comment - What was the issue with just using a released version of csv? This issue is so long already, its hard to find it. I think the major issue is that there is not yet a released version. I think we would definitely prefer that. Speaking from some experience: In my opinion CSV is deceptively complex and is a perfect example of where we would rather 'downstream' that knowledge to someone who focuses on it and gives good performance. Rolling our own is dangerous. But there is not yet any official releases to use...
          Hide
          Uwe Schindler added a comment - - edited

          How we do this is a different issue: We can only rename all classes in that jar-file and republish it (but then we have to change our import statements in our code – I would prefer this), or we jarjar the whole Solr distribution after the complete build.

          It's preferable to post process the Solr jar rather than renaming the packages imported in the source code. That's less intrusive and let you upgrade to the official fixed dependency more easily.

          That's our committers decision. The fact here is, that Solr or even Lucene is not just a WAR file, its consisting of lot's of unrelated JAR artifacts and all of them should live on its own, so a user can remove features he does not need. So I prefer to "transform" those "changed" artifacts separately to private "namespace" and simply change maybe 3 import statements (I can even use "sed" for that).

          Show
          Uwe Schindler added a comment - - edited How we do this is a different issue: We can only rename all classes in that jar-file and republish it (but then we have to change our import statements in our code – I would prefer this), or we jarjar the whole Solr distribution after the complete build. It's preferable to post process the Solr jar rather than renaming the packages imported in the source code. That's less intrusive and let you upgrade to the official fixed dependency more easily. That's our committers decision. The fact here is, that Solr or even Lucene is not just a WAR file, its consisting of lot's of unrelated JAR artifacts and all of them should live on its own, so a user can remove features he does not need. So I prefer to "transform" those "changed" artifacts separately to private "namespace" and simply change maybe 3 import statements (I can even use "sed" for that).
          Hide
          Mark Miller added a comment -

          But there is not yet any official releases to use...

          Ah, interesting. Perhaps we could get the favor of a release

          It's 5 classes it looks like - and it's Apache licensed and there is no release - can't we simply suck this into our code base with a readme and JIRA about removing it and switching to a release when one occurs? Get the whole dependency thing right out of Maven's claws.

          Show
          Mark Miller added a comment - But there is not yet any official releases to use... Ah, interesting. Perhaps we could get the favor of a release It's 5 classes it looks like - and it's Apache licensed and there is no release - can't we simply suck this into our code base with a readme and JIRA about removing it and switching to a release when one occurs? Get the whole dependency thing right out of Maven's claws.
          Hide
          Steve Rowe added a comment -

          So I prefer to "transform" those "changed" artifacts separately to private "namespace" and simply change maybe 3 import statements (I can even use "sed" for that).

          +1

          Show
          Steve Rowe added a comment - So I prefer to "transform" those "changed" artifacts separately to private "namespace" and simply change maybe 3 import statements (I can even use "sed" for that). +1
          Hide
          Emmanuel Bourg added a comment -

          That was the purpose of my patch Mark

          Show
          Emmanuel Bourg added a comment - That was the purpose of my patch Mark
          Hide
          Robert Muir added a comment -

          It's 5 classes it looks like - and it's Apache licensed and there is no release - can't we simply suck this into our code base with a readme and JIRA about removing it and switching to a release when one occurs? Get the whole dependency thing right out of Maven's claws.

          Yes while this is the case, and as Emmanuel states, the purpose of his patch, its not a scalable solution in general.
          Its also expensive to fork software... there is always the danger that we then become out-of-sync and never sync back up.

          (feel free to hit me, for Uwe's example of snowball, which is a perfect example)

          What about the other cases that fall into this same 'maven namespace' category, e.g. the UIMA case for example?

          How to fix the general problem? Thats why I think that the only proper way to fix the bug is to attack it at the heart:
          Thats the fact that for maven project A to depend upon project B that is not in maven, maven project A must "publish" some "fake maven release" of project B.

          Show
          Robert Muir added a comment - It's 5 classes it looks like - and it's Apache licensed and there is no release - can't we simply suck this into our code base with a readme and JIRA about removing it and switching to a release when one occurs? Get the whole dependency thing right out of Maven's claws. Yes while this is the case, and as Emmanuel states, the purpose of his patch, its not a scalable solution in general. Its also expensive to fork software... there is always the danger that we then become out-of-sync and never sync back up. (feel free to hit me, for Uwe's example of snowball, which is a perfect example) What about the other cases that fall into this same 'maven namespace' category, e.g. the UIMA case for example? How to fix the general problem? Thats why I think that the only proper way to fix the bug is to attack it at the heart: Thats the fact that for maven project A to depend upon project B that is not in maven, maven project A must "publish" some "fake maven release" of project B.
          Hide
          Robert Muir added a comment -

          And for the record, i want to withdraw my opposition to Emmanuel's patch.

          I still feel strongly this doesn't solve the 'real problem', but if it makes the commons-csv developers happy, lets do it.

          Though I think its fair to say I will speak up if i see any commits/local modifications to this, because ultimately I want
          to see us depending upon a released version of this jar file.

          Show
          Robert Muir added a comment - And for the record, i want to withdraw my opposition to Emmanuel's patch. I still feel strongly this doesn't solve the 'real problem', but if it makes the commons-csv developers happy, lets do it. Though I think its fair to say I will speak up if i see any commits/local modifications to this, because ultimately I want to see us depending upon a released version of this jar file.
          Hide
          Mark Miller added a comment -

          That was the purpose of my patch Mark

          +1 - I think this is the way to go, I just haven't understood any objections to it yet.

          I still feel strongly this doesn't solve the 'real problem',

          Hate to sound like a broken record jerk, but again I think the real problem is our support of Maven

          I don't see how, short of dropping internal Maven support, we solve the real problem. And barring a full glorious solution, this is a small and simple thing we can do that makes the commons guys happy. If all of a sudden we are bombarded by issues like these, I guess we can burn that bridge later. And drop Maven

          Show
          Mark Miller added a comment - That was the purpose of my patch Mark +1 - I think this is the way to go, I just haven't understood any objections to it yet. I still feel strongly this doesn't solve the 'real problem', Hate to sound like a broken record jerk, but again I think the real problem is our support of Maven I don't see how, short of dropping internal Maven support, we solve the real problem. And barring a full glorious solution, this is a small and simple thing we can do that makes the commons guys happy. If all of a sudden we are bombarded by issues like these, I guess we can burn that bridge later. And drop Maven
          Hide
          Uwe Schindler added a comment -

          I still feel strongly this doesn't solve the 'real problem', but if it makes the commons-csv developers happy, lets do it.

          Though I think its fair to say I will speak up if i see any commits/local modifications to this, because ultimately I want
          to see us depending upon a released version of this jar file.

          To prevent this I would prefer jarjar, its not risky here at all (and also not for other usitility libs from commons). I am already plaing around with that. What I did here was repackage the JAR file and used a simple serach/replace on our source files to refer to new package name. Not even maven config has to be changedm Steven Rowe can publish the solr-commons-csv.jar as he likes.

          Show
          Uwe Schindler added a comment - I still feel strongly this doesn't solve the 'real problem', but if it makes the commons-csv developers happy, lets do it. Though I think its fair to say I will speak up if i see any commits/local modifications to this, because ultimately I want to see us depending upon a released version of this jar file. To prevent this I would prefer jarjar, its not risky here at all (and also not for other usitility libs from commons). I am already plaing around with that. What I did here was repackage the JAR file and used a simple serach/replace on our source files to refer to new package name. Not even maven config has to be changedm Steven Rowe can publish the solr-commons-csv.jar as he likes.
          Hide
          Robert Muir added a comment -

          Uwe: maybe we should do it just for this case (rather than offering a general solution?)

          Basically what i'm trying to say is that the general problem is complex, I think there are enough examples (jetty, carrot2, uima, etc)
          that fall in this category that aren't really solved by jarjar. I think we either need to solve this issue as a 'case-by-case' for commons-csv,
          or fix the entire problem (which definitely, definitely, certainly, without a doubt, involves fixing maven).

          But to be practical, lets come up with something so that commons-csv developers are happy and can issue a release (which we can then depend upon).

          Keeping in mind that this is supposed to be a 3.6 minor release, can we apply the jarjar solution only to commons-csv?
          I feel the other situations are more dangerous and I think we should be careful, but at the same time we should address
          their concerns and put ourselves in a situation where we don't end out 'forking' it... I don't want to see that.

          Show
          Robert Muir added a comment - Uwe: maybe we should do it just for this case (rather than offering a general solution?) Basically what i'm trying to say is that the general problem is complex, I think there are enough examples (jetty, carrot2, uima, etc) that fall in this category that aren't really solved by jarjar. I think we either need to solve this issue as a 'case-by-case' for commons-csv, or fix the entire problem (which definitely, definitely, certainly, without a doubt, involves fixing maven). But to be practical, lets come up with something so that commons-csv developers are happy and can issue a release (which we can then depend upon). Keeping in mind that this is supposed to be a 3.6 minor release, can we apply the jarjar solution only to commons-csv? I feel the other situations are more dangerous and I think we should be careful, but at the same time we should address their concerns and put ourselves in a situation where we don't end out 'forking' it... I don't want to see that.
          Hide
          Mark Miller added a comment -

          can we apply the jarjar solution only to commons-csv?

          +1

          Show
          Mark Miller added a comment - can we apply the jarjar solution only to commons-csv? +1
          Hide
          Steve Rowe added a comment -

          I think there are enough examples (jetty, carrot2, uima, etc) [...]

          We are not currently publishing third party solr-uima-* artifacts, though we have in the past (versions prior to 3.5.0).

          Show
          Steve Rowe added a comment - I think there are enough examples (jetty, carrot2, uima, etc) [...] We are not currently publishing third party solr-uima-* artifacts, though we have in the past (versions prior to 3.5.0).
          Hide
          Uwe Schindler added a comment - - edited

          Here the patch and the transformed JAR file (and renamed). Most of the patch is my rename of the JAR file to solr- prefix. Code changes is 5 lines of import statements!

          What I did:

          • copy the original JAR file to a folder
          • place jarjar-1.2.jar somewhere
          • edit the rules.txt file and change package names
          • run java -jar jarjar-1.2.jar process rule.txt commons-csv-1.0-SNAPSHOT-r966014.jar solr-commons-csv-1.0-SNAPSHOT-r966014.jar
          • place the resulting jar file in solr/core/lib (remove the old one before) and commit it like any other bundled artifact

          All tests pass and the supplied JAR file could be published in maven with the name sarowe gave it.

          Show
          Uwe Schindler added a comment - - edited Here the patch and the transformed JAR file (and renamed). Most of the patch is my rename of the JAR file to solr- prefix. Code changes is 5 lines of import statements! What I did: copy the original JAR file to a folder place jarjar-1.2.jar somewhere edit the rules.txt file and change package names run java -jar jarjar-1.2.jar process rule.txt commons-csv-1.0-SNAPSHOT-r966014.jar solr-commons-csv-1.0-SNAPSHOT-r966014.jar place the resulting jar file in solr/core/lib (remove the old one before) and commit it like any other bundled artifact All tests pass and the supplied JAR file could be published in maven with the name sarowe gave it.
          Hide
          Robert Muir added a comment -

          Thanks Uwe: i support your change. I think it is the best compromise.

          it seems to me, because the jar is "pre-jar-jar'ed" we don't fall into various
          IDE configuration conflicts either right? Its just as easy for developers to configure
          their workspace (ant eclipse, ant idea)

          +1

          Show
          Robert Muir added a comment - Thanks Uwe: i support your change. I think it is the best compromise. it seems to me, because the jar is "pre-jar-jar'ed" we don't fall into various IDE configuration conflicts either right? Its just as easy for developers to configure their workspace (ant eclipse, ant idea) +1
          Hide
          Steve Rowe added a comment -

          +1 for Uwe's JarJar fix.

          Show
          Steve Rowe added a comment - +1 for Uwe's JarJar fix.
          Hide
          Uwe Schindler added a comment -

          The keep line is not even needed in the rules.txt. There is one small issue: The resulting JAR file contains an empty org.apache.commons folder and some maven relicts (die,die,die), but thats minor.

          Show
          Uwe Schindler added a comment - The keep line is not even needed in the rules.txt. There is one small issue: The resulting JAR file contains an empty org.apache.commons folder and some maven relicts (die,die,die), but thats minor.
          Hide
          Uwe Schindler added a comment -

          Updated patch to make it mor consistent with the naming conventions of Solr.

          I think we should do the same for the other file in lib: noggit

          Steven Rowe: What other modified JARs do we have that are republished through maven?

          Show
          Uwe Schindler added a comment - Updated patch to make it mor consistent with the naming conventions of Solr. I think we should do the same for the other file in lib: noggit Steven Rowe: What other modified JARs do we have that are republished through maven?
          Hide
          Dawid Weiss added a comment -

          Don't jarjar Carrot2, please. We are plugging in our commercial clustering engine into Solr via Carrot2 controllers so if you change package namespace we will no longer be able to do it.

          Show
          Dawid Weiss added a comment - Don't jarjar Carrot2, please. We are plugging in our commercial clustering engine into Solr via Carrot2 controllers so if you change package namespace we will no longer be able to do it.
          Hide
          Steve Rowe added a comment - - edited

          Steven Rowe: What other modified JARs do we have that are republished through maven?

          For branch_3x:

          module jar
          Lucene benchmark lucene/contrib/benchmark/lib/xercesImpl-2.9.1-patched-XERCESJ-1257.jar
          Solr clustering solr/contrib/clustering/carrot2-core-3.5.0.jar
          Solr langid solr/contrib/langid/lib/jsonic-1.2.0.jar
          Solr langid solr/contrib/langid/lib/langdetect-r111-java5.jar
          Solr core solr/lib/apache-solr-noggit-r1099557.jar
          Solr core solr/lib/commons-csv-1.0-SNAPSHOT-r966014.jar

          Trunk is the same, except that the carrot2-core jar is no longer required (it's required on branch_3x because it's a specially Java5-compiled jar, but no longer on trunk/4.0, which requires Java 6, and so can use the Maven central artifact.)

          Show
          Steve Rowe added a comment - - edited Steven Rowe: What other modified JARs do we have that are republished through maven? For branch_3x: module jar Lucene benchmark lucene/contrib/benchmark/lib/xercesImpl-2.9.1-patched-XERCESJ-1257.jar Solr clustering solr/contrib/clustering/carrot2-core-3.5.0.jar Solr langid solr/contrib/langid/lib/jsonic-1.2.0.jar Solr langid solr/contrib/langid/lib/langdetect-r111-java5.jar Solr core solr/lib/apache-solr-noggit-r1099557.jar Solr core solr/lib/commons-csv-1.0-SNAPSHOT-r966014.jar Trunk is the same, except that the carrot2-core jar is no longer required (it's required on branch_3x because it's a specially Java5-compiled jar, but no longer on trunk/4.0, which requires Java 6, and so can use the Maven central artifact.)
          Hide
          Uwe Schindler added a comment -

          According to Steven, we don't republish them so there is no need to do this.

          Show
          Uwe Schindler added a comment - According to Steven, we don't republish them so there is no need to do this.
          Hide
          Uwe Schindler added a comment -

          For carrot: If you can just compile carrot with Java 5 and produce Java 5 JARs, why does Dawid/Carrot not do this? A Java 6 class/jar file is not really useful if just the version number of the file header is different...

          Show
          Uwe Schindler added a comment - For carrot: If you can just compile carrot with Java 5 and produce Java 5 JARs, why does Dawid/Carrot not do this? A Java 6 class/jar file is not really useful if just the version number of the file header is different...
          Hide
          Robert Muir added a comment -

          Well the carrot is similar to the uima case, in both cases we have their committers also as committers working
          on integrations within on our project, and they have voiced no problem with how things work so far, so why break it?

          Show
          Robert Muir added a comment - Well the carrot is similar to the uima case, in both cases we have their committers also as committers working on integrations within on our project, and they have voiced no problem with how things work so far, so why break it?
          Hide
          Michael McCandless added a comment -

          How to fix the general problem? Thats why I think that the only proper way to fix the bug is to attack it at the heart:
          Thats the fact that for maven project A to depend upon project B that is not in maven, maven project A must "publish" some "fake maven release" of project B.

          +1, this sums up the Maven issue.

          But is it really true? Is there really no way to stop Maven from
          making project A's private deps public? This is horribly invasive.

          CLASSPATH polution is a separate, widespread, longstanding and very
          real issue. I don't think jarjar'ing every dependency Solr (or our
          modules, contribs, etc.) is a good general solution.

          But I think it's OK in this case... progress not perfection. Your
          patch looks good Uwe, thanks!

          Show
          Michael McCandless added a comment - How to fix the general problem? Thats why I think that the only proper way to fix the bug is to attack it at the heart: Thats the fact that for maven project A to depend upon project B that is not in maven, maven project A must "publish" some "fake maven release" of project B. +1, this sums up the Maven issue. But is it really true? Is there really no way to stop Maven from making project A's private deps public? This is horribly invasive. CLASSPATH polution is a separate, widespread, longstanding and very real issue. I don't think jarjar'ing every dependency Solr (or our modules, contribs, etc.) is a good general solution. But I think it's OK in this case... progress not perfection. Your patch looks good Uwe, thanks!
          Hide
          Ryan McKinley added a comment -

          +1 thanks Uwe

          I don't think jarjar'ing every dependency Solr

          The only jar files in question are snapshot/unreleased code. The commons-csv problem is a non issue if we were using a released version.

          Show
          Ryan McKinley added a comment - +1 thanks Uwe I don't think jarjar'ing every dependency Solr The only jar files in question are snapshot/unreleased code. The commons-csv problem is a non issue if we were using a released version.
          Hide
          Michael McCandless added a comment -

          The commons-csv problem is a non issue if we were using a released version.

          Wait, are you talking about the Maven issue ("my private deps become publicly released to Maven"), or the more general JAR hell problem?

          For JAR hell, I thought even released versions can cause problems too?

          Ie, we include version X of a dep, and the app embedding solr is using version Y, and they may conflict?

          Ie the classic problem: http://en.wikipedia.org/wiki/Java_Classloader#JAR_hell

          Show
          Michael McCandless added a comment - The commons-csv problem is a non issue if we were using a released version. Wait, are you talking about the Maven issue ("my private deps become publicly released to Maven"), or the more general JAR hell problem? For JAR hell, I thought even released versions can cause problems too? Ie, we include version X of a dep, and the app embedding solr is using version Y, and they may conflict? Ie the classic problem: http://en.wikipedia.org/wiki/Java_Classloader#JAR_hell
          Hide
          Tommaso Teofili added a comment -

          Well the carrot is similar to the uima case, in both cases we have their committers also as committers working on integrations within on our project, and they have voiced no problem with how things work so far, so why break it?

          also starting from 3.5.0 the UIMA dependencies' jars are released artifacts (see SOLR-2746 and thus http://mvnrepository.com/artifact/org.apache.solr/solr-uima/3.5.0 )

          Show
          Tommaso Teofili added a comment - Well the carrot is similar to the uima case, in both cases we have their committers also as committers working on integrations within on our project, and they have voiced no problem with how things work so far, so why break it? also starting from 3.5.0 the UIMA dependencies' jars are released artifacts (see SOLR-2746 and thus http://mvnrepository.com/artifact/org.apache.solr/solr-uima/3.5.0 )
          Hide
          Uwe Schindler added a comment -

          For JAR hell, I thought even released versions can cause problems too?

          Yes of course. But e.g. maven can help to prevent those problems. It will not allow you to add the same dependency in different versions. The issue we had caused by our separate release of unreleased package under the solr group-id was that maven is seeing our repackaged dependency under another artifact id - so it cannot prevent that a project adds solr-commons-xx, version-foo and commony-xx, version-bar, because it is two different things.

          Show
          Uwe Schindler added a comment - For JAR hell, I thought even released versions can cause problems too? Yes of course. But e.g. maven can help to prevent those problems. It will not allow you to add the same dependency in different versions. The issue we had caused by our separate release of unreleased package under the solr group-id was that maven is seeing our repackaged dependency under another artifact id - so it cannot prevent that a project adds solr-commons-xx, version-foo and commony-xx, version-bar, because it is two different things.
          Hide
          Tommaso Teofili added a comment -

          The issue we had caused by our separate release of unreleased package under the solr group-id was that maven is seeing our repackaged dependency under another artifact id - so it cannot prevent that a project adds solr-commons-xx, version-foo and commony-xx, version-bar, because it is two different things.

          yes, that's also my understanding of this issue with unreleased dependencies' jars.

          Show
          Tommaso Teofili added a comment - The issue we had caused by our separate release of unreleased package under the solr group-id was that maven is seeing our repackaged dependency under another artifact id - so it cannot prevent that a project adds solr-commons-xx, version-foo and commony-xx, version-bar, because it is two different things. yes, that's also my understanding of this issue with unreleased dependencies' jars.
          Hide
          Ryan McKinley added a comment -

          Wait, are you talking about the Maven issue ("my private deps become publicly released to Maven"), or the more general JAR hell problem?

          I see this entirely as as JAR hell problem. Maven just makes it easier to discover.

          For JAR hell, I thought even released versions can cause problems too?

          Of course - but this is an expected side-effect of a release. This whole discussion is because the commons-csv people never agreed to support the classnames solr is using. If they request we change the name, it seems only appropriate that we do.

          Show
          Ryan McKinley added a comment - Wait, are you talking about the Maven issue ("my private deps become publicly released to Maven"), or the more general JAR hell problem? I see this entirely as as JAR hell problem. Maven just makes it easier to discover. For JAR hell, I thought even released versions can cause problems too? Of course - but this is an expected side-effect of a release. This whole discussion is because the commons-csv people never agreed to support the classnames solr is using. If they request we change the name, it seems only appropriate that we do.
          Hide
          Dawid Weiss added a comment -

          For carrot: If you can just compile carrot with Java 5 and produce Java 5 JARs, why does Dawid/Carrot not do this? A Java 6 class/jar file is not really useful if just the version number of the file header is different...

          It's not that simple. We use 1.6 APIs and we have libraries that are compiled under 1.6. The version embedded in Solr is patched and backported using retroweaver if I remember correctly. Like I said, I don't have a problem if you just publish that particular artefact as is under Solr's groupid. I don't see a problem here, really. Classpath conflicts do happen but it's easier to recover from a classpath conflict than from a jarjared file which happens to use runtime reflection or something like that (just replacing strings is often not enough if class names are constructed dynamically).

          Show
          Dawid Weiss added a comment - For carrot: If you can just compile carrot with Java 5 and produce Java 5 JARs, why does Dawid/Carrot not do this? A Java 6 class/jar file is not really useful if just the version number of the file header is different... It's not that simple. We use 1.6 APIs and we have libraries that are compiled under 1.6. The version embedded in Solr is patched and backported using retroweaver if I remember correctly. Like I said, I don't have a problem if you just publish that particular artefact as is under Solr's groupid. I don't see a problem here, really. Classpath conflicts do happen but it's easier to recover from a classpath conflict than from a jarjared file which happens to use runtime reflection or something like that (just replacing strings is often not enough if class names are constructed dynamically).
          Hide
          Benson Margulies added a comment - - edited

          So, I hear that you all wish that you had more control over the dependencies that you propagate when you publish to maven central. Perhaps I could help?

          Option 1, which you've already detected, is to use jarjar or shade so that you can publish your fork of someone else's stuff in your own package namespace fairly painlessly, either storing it inside some other jar of yours or publishing it under your coordinates.

          Option 2 is optional dependencies: http://maven.apache.org/guides/introduction/introduction-to-optional-and-excludes-dependencies.html.

          This is a way to force the users of your artifacts to explicitly declare some or all of your dependencies, rather than getting them automatically. In other words, virality control.

          Now, if you don't publish one of these, anyone who uses your published artifact will get past the maven dependency graph – and then sink or swim based on whether they've made their own arrangements to get the code into their classpath. I can't help wondering how this helps with the original complaint of the Common(s)-Folk: if you release a version of Lucene that requires a forked version of Commons, and you don't change the packages, you're setting up for a classpath conflict, sooner or later, with or without maven, aren't you?

          Option 3 is simply to observe that your users can always use 'exclude' to cut off following some dependency link of yours that doesn't appeal to them.

          Does any of this help?

          p.s. I don't know much about jarjar, but I know that shade has a lot of fancy options that can, in some cases, compensate for reflection or other problems with the package rename process.

          Show
          Benson Margulies added a comment - - edited So, I hear that you all wish that you had more control over the dependencies that you propagate when you publish to maven central. Perhaps I could help? Option 1, which you've already detected, is to use jarjar or shade so that you can publish your fork of someone else's stuff in your own package namespace fairly painlessly, either storing it inside some other jar of yours or publishing it under your coordinates. Option 2 is optional dependencies: http://maven.apache.org/guides/introduction/introduction-to-optional-and-excludes-dependencies.html . This is a way to force the users of your artifacts to explicitly declare some or all of your dependencies, rather than getting them automatically. In other words, virality control. Now, if you don't publish one of these, anyone who uses your published artifact will get past the maven dependency graph – and then sink or swim based on whether they've made their own arrangements to get the code into their classpath. I can't help wondering how this helps with the original complaint of the Common(s)-Folk: if you release a version of Lucene that requires a forked version of Commons, and you don't change the packages, you're setting up for a classpath conflict, sooner or later, with or without maven, aren't you? Option 3 is simply to observe that your users can always use 'exclude' to cut off following some dependency link of yours that doesn't appeal to them. Does any of this help? p.s. I don't know much about jarjar, but I know that shade has a lot of fancy options that can, in some cases, compensate for reflection or other problems with the package rename process.
          Hide
          Uwe Schindler added a comment -

          p.s. I don't know much about jarjar, but I know that shade has a lot of fancy options that can, in some cases, compensate for reflection or other problems with the package rename process.

          jarjar does the same. Shade is only available to Maven so not really useable for us.

          Show
          Uwe Schindler added a comment - p.s. I don't know much about jarjar, but I know that shade has a lot of fancy options that can, in some cases, compensate for reflection or other problems with the package rename process. jarjar does the same. Shade is only available to Maven so not really useable for us.
          Hide
          Uwe Schindler added a comment -

          Dawid: Thanks for insight!

          Show
          Uwe Schindler added a comment - Dawid: Thanks for insight!
          Hide
          Uwe Schindler added a comment -

          I prepared my checkout for committing this. It seems that we all agree that this is, until commons-csv is released, the simpliest option. I cleaned up the JAR file a little bit and removed the empty commons directory and removed the wrong maven metadata. I also added a note to NOTICE.txt, that this JAR file is a prerelease of commons-csv and was jarjared to an internal solr package name.

          Any objections?

          Show
          Uwe Schindler added a comment - I prepared my checkout for committing this. It seems that we all agree that this is, until commons-csv is released, the simpliest option. I cleaned up the JAR file a little bit and removed the empty commons directory and removed the wrong maven metadata. I also added a note to NOTICE.txt, that this JAR file is a prerelease of commons-csv and was jarjared to an internal solr package name. Any objections?
          Hide
          Uwe Schindler added a comment -

          Here my final patch and cleaned up jar.

          Show
          Uwe Schindler added a comment - Here my final patch and cleaned up jar.
          Hide
          Michael McCandless added a comment -

          Any objections?

          +1 to commit.

          But can you please open a follow-on issue that once commons-csv is released we should upgrade...?

          Show
          Michael McCandless added a comment - Any objections? +1 to commit. But can you please open a follow-on issue that once commons-csv is released we should upgrade...?
          Hide
          Steve Rowe added a comment -

          Option 2 is optional dependencies: http://maven.apache.org/guides/introduction/introduction-to-optional-and-excludes-dependencies.html.

          This is a way to force the users of your artifacts to explicitly declare some or all of your dependencies, rather than getting them automatically. In other words, virality control.

          I don't like the fact that taking this approach for all of our non-Mavenized dependencies (by which I mean those that are not in Maven Central) would put a much greater burden on our users that consume Lucene/Solr artifacts via Maven.

          Users would have to either have to a) download the binary release and manually install the non-Maven artifacts one-by-one in their local or internal repository (after consulting both the top-level Maven POM and a list of per-module dependencies that currently only exists in a lib/ directory listing); or b) download the source release, run ant get-maven-poms, then run mvn -N -Pbootstrap install. Neither of these fall within the expected level of effort for Maven users.

          -1 from me for using optional dependencies to counter Maven's virality.

          Show
          Steve Rowe added a comment - Option 2 is optional dependencies: http://maven.apache.org/guides/introduction/introduction-to-optional-and-excludes-dependencies.html . This is a way to force the users of your artifacts to explicitly declare some or all of your dependencies, rather than getting them automatically. In other words, virality control. I don't like the fact that taking this approach for all of our non-Mavenized dependencies (by which I mean those that are not in Maven Central) would put a much greater burden on our users that consume Lucene/Solr artifacts via Maven. Users would have to either have to a) download the binary release and manually install the non-Maven artifacts one-by-one in their local or internal repository (after consulting both the top-level Maven POM and a list of per-module dependencies that currently only exists in a lib/ directory listing); or b) download the source release, run ant get-maven-poms , then run mvn -N -Pbootstrap install . Neither of these fall within the expected level of effort for Maven users. -1 from me for using optional dependencies to counter Maven's virality.
          Hide
          Uwe Schindler added a comment -

          But can you please open a follow-on issue that once commons-csv is released we should upgrade...?

          Of course!

          Show
          Uwe Schindler added a comment - But can you please open a follow-on issue that once commons-csv is released we should upgrade...? Of course!
          Hide
          Emmanuel Bourg added a comment -

          Uwe you can also remove the writer package in the jar, Solr doesn't use it and it has been removed from Commons CSV.

          Show
          Emmanuel Bourg added a comment - Uwe you can also remove the writer package in the jar, Solr doesn't use it and it has been removed from Commons CSV.
          Hide
          Uwe Schindler added a comment -

          I think I leave it as it is for now. The jar file name contains the revision no of the Commons-CSV, so it should be unmodified (exept the package name).

          Once Commons-CSV is released, we will drop the JAR file and replace by the official dependency.

          Show
          Uwe Schindler added a comment - I think I leave it as it is for now. The jar file name contains the revision no of the Commons-CSV, so it should be unmodified (exept the package name). Once Commons-CSV is released, we will drop the JAR file and replace by the official dependency.
          Hide
          Benson Margulies added a comment -

          In the unlikely event that you bump into something that makes shade attractive, let me know. I'll make you an ant task for it.

          Show
          Benson Margulies added a comment - In the unlikely event that you bump into something that makes shade attractive, let me know. I'll make you an ant task for it.
          Hide
          Michael McCandless added a comment -

          For JAR hell, I thought even released versions can cause problems too?

          Yes of course. But e.g. maven can help to prevent those problems. It will not allow you to add the same dependency in different versions. The issue we had caused by our separate release of unreleased package under the solr group-id was that maven is seeing our repackaged dependency under another artifact id - so it cannot prevent that a project adds solr-commons-xx, version-foo and commony-xx, version-bar, because it is two different things.

          I see. So then we "subverted" Maven's ability to catch the
          conflicting versions of the same dep, by not using the same
          coordinates that commons-csv will eventually use, once they release?

          Given that, as crazy as this sounds, maybe we should have published
          using the same coordinates? Ie this way Maven apps would know they
          had conflicting versions (one, from Solr, of what is basically a
          pre-release snapshot of commons-csv, and another of a real, eventual
          future commons-csv release).

          Or.... maybe Maven should not make it a hard requirement that a
          non-snapshot release (of Solr) cannot depend on a snapshot release (of
          commons-csv)? After all, this is really a judgement call of the
          committers (in Solr), as to whether a snapshot release is "good
          enough"/well tested, given how the project is using it, for its
          purposes/release?

          Net/net it looks like Maven's invasive/viral restrictions ("all of
          your deps become publicly released", and "your deps cannot be
          snapshots") are tying our hands here?

          Surely this problem has come up before in Maven usage?

          Show
          Michael McCandless added a comment - For JAR hell, I thought even released versions can cause problems too? Yes of course. But e.g. maven can help to prevent those problems. It will not allow you to add the same dependency in different versions. The issue we had caused by our separate release of unreleased package under the solr group-id was that maven is seeing our repackaged dependency under another artifact id - so it cannot prevent that a project adds solr-commons-xx, version-foo and commony-xx, version-bar, because it is two different things. I see. So then we "subverted" Maven's ability to catch the conflicting versions of the same dep, by not using the same coordinates that commons-csv will eventually use, once they release? Given that, as crazy as this sounds, maybe we should have published using the same coordinates? Ie this way Maven apps would know they had conflicting versions (one, from Solr, of what is basically a pre-release snapshot of commons-csv, and another of a real, eventual future commons-csv release). Or.... maybe Maven should not make it a hard requirement that a non-snapshot release (of Solr) cannot depend on a snapshot release (of commons-csv)? After all, this is really a judgement call of the committers (in Solr), as to whether a snapshot release is "good enough"/well tested, given how the project is using it, for its purposes/release? Net/net it looks like Maven's invasive/viral restrictions ("all of your deps become publicly released", and "your deps cannot be snapshots") are tying our hands here? Surely this problem has come up before in Maven usage?
          Hide
          Benson Margulies added a comment -

          Michael,

          I think you have put your finger on it. If you don't change the packages, don't change the gav. If you do change the packages, do change the gav.

          None of which is to claim that Maven isn't full of flaws in this whole area; it does not model the entire world of things people need to do.

          Show
          Benson Margulies added a comment - Michael, I think you have put your finger on it. If you don't change the packages, don't change the gav. If you do change the packages, do change the gav. None of which is to claim that Maven isn't full of flaws in this whole area; it does not model the entire world of things people need to do.
          Hide
          Uwe Schindler added a comment -

          Committed trunk revision: 1298164
          Backported 3.x revision: 1298169

          Steven Rowe: If I broke something with maven, can you please fix? I have no idea...

          Show
          Uwe Schindler added a comment - Committed trunk revision: 1298164 Backported 3.x revision: 1298169 Steven Rowe: If I broke something with maven, can you please fix? I have no idea...
          Hide
          Greg Stein added a comment -

          Great work, Uwe. Thanks for stepping up help out the Apache Commons people!

          Show
          Greg Stein added a comment - Great work, Uwe. Thanks for stepping up help out the Apache Commons people!
          Hide
          Steve Rowe added a comment -

          Steven Rowe: If I broke something with maven, can you please fix? I have no idea...

          Only one problem: the name of the source file in the bootstrap profile (copies lib/ files to users' local Maven repositories) - I just committed the file renaming to both trunk and branch_3x.

          Show
          Steve Rowe added a comment - Steven Rowe: If I broke something with maven, can you please fix? I have no idea... Only one problem: the name of the source file in the bootstrap profile (copies lib/ files to users' local Maven repositories) - I just committed the file renaming to both trunk and branch_3x.
          Hide
          Scott Carey added a comment -

          This is a maven issue. solr-commons-csv only exists to maven.

          Otherwise these are just classes inside the war file (essentially an implementation detail of solr).

          The java classpath is not an implementation detail if you expect any of your jars to share a classloader with others. At least not until project Jigsaw is complete and Solr is using Java 8 modules.

          Yes of course. But e.g. maven can help to prevent those problems. It will not allow you to add the same dependency in different versions. The issue we had caused by our separate release of unreleased package under the solr group-id was that maven is seeing our repackaged dependency under another artifact id - so it cannot prevent that a project adds solr-commons-xx, version-foo and commony-xx, version-bar, because it is two different things.

          Exactly. If you publish a maven artifact under a custom groupId:artifactId the sane thing to do is make sure the classes don't overlap with the old one. Maven guards against duplicate artifacts in the same classpath. Otherwise users will implicitly get both copies on their classpath too easily.

          The arguments that this is more than a maven issue are bogus.
          Do you take me for a fool?

          What happens if a non-maven user gets both solr-commons-csv.jar and commons-csv.jar on their classpath?

          I think you missed my point?: Changing the Ant build to fix a Maven problem is a non-starter.

          Then specify it as <optional>true</optional> or <scope>provided</scope>. If you publish to maven central a pom.xml that says "Solr must have jar FOO!" and then complain that it is a "Maven problem" when jar FOO is actually used, then you've used maven incorrectly.

          Users would have to either have to a) download the binary release and manually install the non-Maven artifacts one-by-one in their local or internal repository (after consulting both the top-level Maven POM and a list of per-module dependencies that currently only exists in a lib/ directory listing); or b) download the source release, run ant get-maven-poms, then run mvn -N -Pbootstrap install. Neither of these fall within the expected level of effort for Maven users.

          or C)
          you can publish the artifacts, link to them as optional, and users can specify in their pom's which jar to bring in. This is not that uncommon – it is the expected way to deal with slf4j for example, where one and only one of several options must be chosen at runtime. Same thing with the bytecode re-writing dependency in Hibernate.

          -1 from me for using optional dependencies to counter Maven's virality.

          Please consider this further with option C) above for artifacts that live in a custom groupId:artifactId namespace but not a custom package.

          Show
          Scott Carey added a comment - This is a maven issue. solr-commons-csv only exists to maven. Otherwise these are just classes inside the war file (essentially an implementation detail of solr). The java classpath is not an implementation detail if you expect any of your jars to share a classloader with others. At least not until project Jigsaw is complete and Solr is using Java 8 modules. Yes of course. But e.g. maven can help to prevent those problems. It will not allow you to add the same dependency in different versions. The issue we had caused by our separate release of unreleased package under the solr group-id was that maven is seeing our repackaged dependency under another artifact id - so it cannot prevent that a project adds solr-commons-xx, version-foo and commony-xx, version-bar, because it is two different things. Exactly. If you publish a maven artifact under a custom groupId:artifactId the sane thing to do is make sure the classes don't overlap with the old one. Maven guards against duplicate artifacts in the same classpath. Otherwise users will implicitly get both copies on their classpath too easily. The arguments that this is more than a maven issue are bogus. Do you take me for a fool? What happens if a non-maven user gets both solr-commons-csv.jar and commons-csv.jar on their classpath? I think you missed my point?: Changing the Ant build to fix a Maven problem is a non-starter. Then specify it as <optional>true</optional> or <scope>provided</scope>. If you publish to maven central a pom.xml that says "Solr must have jar FOO!" and then complain that it is a "Maven problem" when jar FOO is actually used, then you've used maven incorrectly. Users would have to either have to a) download the binary release and manually install the non-Maven artifacts one-by-one in their local or internal repository (after consulting both the top-level Maven POM and a list of per-module dependencies that currently only exists in a lib/ directory listing); or b) download the source release, run ant get-maven-poms, then run mvn -N -Pbootstrap install. Neither of these fall within the expected level of effort for Maven users. or C) you can publish the artifacts, link to them as optional, and users can specify in their pom's which jar to bring in. This is not that uncommon – it is the expected way to deal with slf4j for example, where one and only one of several options must be chosen at runtime. Same thing with the bytecode re-writing dependency in Hibernate. -1 from me for using optional dependencies to counter Maven's virality. Please consider this further with option C) above for artifacts that live in a custom groupId:artifactId namespace but not a custom package.
          Hide
          Steve Rowe added a comment -

          or C)
          you can publish the artifacts, link to them as optional, and users can specify in their pom's which jar to bring in. This is not that uncommon – it is the expected way to deal with slf4j for example, where one and only one of several options must be chosen at runtime. Same thing with the bytecode re-writing dependency in Hibernate.

          -1 from me for using optional dependencies to counter Maven's virality.

          Please consider this further with option C) above for artifacts that live in a custom groupId:artifactId namespace but not a custom package.

          My implicit assumption was that as a result of using optional dependencies, we would cease publishing any third-party dependencies to Maven Central. (That's part of what I was -1'ing.)

          Show
          Steve Rowe added a comment - or C) you can publish the artifacts, link to them as optional, and users can specify in their pom's which jar to bring in. This is not that uncommon – it is the expected way to deal with slf4j for example, where one and only one of several options must be chosen at runtime. Same thing with the bytecode re-writing dependency in Hibernate. -1 from me for using optional dependencies to counter Maven's virality. Please consider this further with option C) above for artifacts that live in a custom groupId:artifactId namespace but not a custom package. My implicit assumption was that as a result of using optional dependencies, we would cease publishing any third-party dependencies to Maven Central. (That's part of what I was -1'ing.)
          Hide
          Michael McCandless added a comment -

          If you don't change the packages, don't change the gav. If you do change the packages, do change the gav.

          +1

          And... I hate to heap even more invasiveness/virality onto Maven, but... it
          seems like if Maven also checked across artifacts
          (groupID+artifactID?) for package name conflicts... that would prevent
          future cases of accidentally releasing a private dependency? (Hmmm,
          assuming commons-csv releases snapshots...).

          Show
          Michael McCandless added a comment - If you don't change the packages, don't change the gav. If you do change the packages, do change the gav. +1 And... I hate to heap even more invasiveness/virality onto Maven, but... it seems like if Maven also checked across artifacts (groupID+artifactID?) for package name conflicts... that would prevent future cases of accidentally releasing a private dependency? (Hmmm, assuming commons-csv releases snapshots...).
          Hide
          Sami Siren added a comment -

          I discovered this issue because someone who imported solr-commons-csv in its project reported an issue (SANDBOX-330), an infinite loop with trailing comments, that was already fixed on the trunk

          So is the current solr version affected by this? Should it be updated?

          Show
          Sami Siren added a comment - I discovered this issue because someone who imported solr-commons-csv in its project reported an issue ( SANDBOX-330 ), an infinite loop with trailing comments, that was already fixed on the trunk So is the current solr version affected by this? Should it be updated?
          Hide
          Emmanuel Bourg added a comment -

          No he used an old version of solr-commons-csv, I checked the commit log and the one used by Solr is fixed.

          Show
          Emmanuel Bourg added a comment - No he used an old version of solr-commons-csv, I checked the commit log and the one used by Solr is fixed.

            People

            • Assignee:
              Uwe Schindler
              Reporter:
              Emmanuel Bourg
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development