Details

    • Type: New Feature
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: nutchgora
    • Fix Version/s: nutchgora
    • Component/s: build
    • Labels:
      None

      Description

      Ivy is the de-facto dependency management tool used in conjunction with Ant. It would be nice if we switch to using Ivy in Nutch builds.

      Maven is also an alternative, but I think Nutch will benefit more with an Ant+Ivy architecture.

      1. NUTCH-821.patch
        174 kB
        Julien Nioche
      2. nutchbase-ivy_v1.patch
        169 kB
        Enis Soztutar

        Activity

        Hide
        ab Andrzej Bialecki added a comment -

        +1 for Ivy.

        Show
        ab Andrzej Bialecki added a comment - +1 for Ivy.
        Hide
        castagna Paolo Castagna added a comment -

        Are the Nutch artifacts published somewhere in a repository available for Maven or Ant+Ivy users?

        Show
        castagna Paolo Castagna added a comment - Are the Nutch artifacts published somewhere in a repository available for Maven or Ant+Ivy users?
        Hide
        castagna Paolo Castagna added a comment -

        If your aim is publishing Nutch artifacts in a repository which is compatible with Maven as well as Ivy, you might consider using Maven Ant Tasks (http://maven.apache.org/ant-tasks/) (which IMHO has less pains on the publishing side).

        Show
        castagna Paolo Castagna added a comment - If your aim is publishing Nutch artifacts in a repository which is compatible with Maven as well as Ivy, you might consider using Maven Ant Tasks ( http://maven.apache.org/ant-tasks/ ) (which IMHO has less pains on the publishing side).
        Hide
        jnioche Julien Nioche added a comment -

        Ciao Paolo, the aim is not to publish Nutch artifacts but mainly to manage the dependencies. A simple and straightforward combination of Ant and Ivy will do fine for that.

        Show
        jnioche Julien Nioche added a comment - Ciao Paolo, the aim is not to publish Nutch artifacts but mainly to manage the dependencies. A simple and straightforward combination of Ant and Ivy will do fine for that.
        Hide
        castagna Paolo Castagna added a comment -

        Is there any plan to publish Nutch artifacts in the future?

        Show
        castagna Paolo Castagna added a comment - Is there any plan to publish Nutch artifacts in the future?
        Hide
        chrismattmann Chris A. Mattmann added a comment -

        I'm a Maven fan but haven't had time to work up a Nutch Maven build as of yet.

        Show
        chrismattmann Chris A. Mattmann added a comment - I'm a Maven fan but haven't had time to work up a Nutch Maven build as of yet.
        Hide
        enis Enis Soztutar added a comment -

        I believe publishing nutch artifacts to maven repo is another issue. We can open a new issue if there is enough demand. I believe, ivy can generate the necessary pom's for publishing to maven.

        Show
        enis Enis Soztutar added a comment - I believe publishing nutch artifacts to maven repo is another issue. We can open a new issue if there is enough demand. I believe, ivy can generate the necessary pom's for publishing to maven.
        Hide
        enis Enis Soztutar added a comment -

        I'm attaching a patch which introduces ivy to nutch builds. It is a patch against current nutchbase code base hosted at github.
        We can adopt this patch to current nutch trunk, but I see little point in doing so since the two code bases will merge eventually.

        The way ivy patch works is as follows :
        There are two configurations default and test. Test dependencies are managed at test configuration.
        libraries not managed by ivy are still at their old locations ( for example "nutch/lib/" or "nutch/src/plugin/clustering-carrot/lib/")
        the root ivy file is located at nutch/ivy/ivy.xml
        Each plugin has it's own ivy file, which is located at nutch/src/plugin/<plugin>/ivy.xml
        root dependencies are downloaded to build/lib
        plugin dependencies are downloaded to build/plugins/<plugin>/

        I have upgraded some of the libraries in the process, where only a the release number changed only in minor number (for example lucene-core-2.4.0 -> 2.4.1 )

        Show
        enis Enis Soztutar added a comment - I'm attaching a patch which introduces ivy to nutch builds. It is a patch against current nutchbase code base hosted at github. We can adopt this patch to current nutch trunk, but I see little point in doing so since the two code bases will merge eventually. The way ivy patch works is as follows : There are two configurations default and test. Test dependencies are managed at test configuration. libraries not managed by ivy are still at their old locations ( for example "nutch/lib/" or "nutch/src/plugin/clustering-carrot/lib/") the root ivy file is located at nutch/ivy/ivy.xml Each plugin has it's own ivy file, which is located at nutch/src/plugin/<plugin>/ivy.xml root dependencies are downloaded to build/lib plugin dependencies are downloaded to build/plugins/<plugin>/ I have upgraded some of the libraries in the process, where only a the release number changed only in minor number (for example lucene-core-2.4.0 -> 2.4.1 )
        Hide
        castagna Paolo Castagna added a comment -

        Same "pain" you are experiencing with the Nutch dependencies which are not in the Maven repository will be experienced by Nutch users which might want to depend/reuse Nutch code in their project.

        But, I do agree... publishing Nutch artifacts is another issue.

        Show
        castagna Paolo Castagna added a comment - Same "pain" you are experiencing with the Nutch dependencies which are not in the Maven repository will be experienced by Nutch users which might want to depend/reuse Nutch code in their project. But, I do agree... publishing Nutch artifacts is another issue.
        Hide
        enis Enis Soztutar added a comment -

        Opened issue NUTCH-825, for publishing the artifacts.

        Show
        enis Enis Soztutar added a comment - Opened issue NUTCH-825 , for publishing the artifacts.
        Hide
        jnioche Julien Nioche added a comment -

        Adds IVY support for dependencies

        The lib/. dir is maintained and will be used to store dependencies which are not accessible via Ivy (e.g. GORA). The libs managed by Ivy are put in the directory build/lib.

        This patch also differentiates the build path from the dist path.

        Show
        jnioche Julien Nioche added a comment - Adds IVY support for dependencies The lib/. dir is maintained and will be used to store dependencies which are not accessible via Ivy (e.g. GORA). The libs managed by Ivy are put in the directory build/lib. This patch also differentiates the build path from the dist path.
        Hide
        ab Andrzej Bialecki added a comment -

        I think this patch refers to some parts that were already removed in NUTCH-837 ...

        Also, it would be nice to have a target that sets up an Eclipse project - after this patch is applied the lib/ is nearly empty and you need to run build at least once to bring dependencies - this may be confusing.

        Show
        ab Andrzej Bialecki added a comment - I think this patch refers to some parts that were already removed in NUTCH-837 ... Also, it would be nice to have a target that sets up an Eclipse project - after this patch is applied the lib/ is nearly empty and you need to run build at least once to bring dependencies - this may be confusing.
        Hide
        jnioche Julien Nioche added a comment -

        I think this patch refers to some parts that were already removed in NUTCH-837 ...

        I applied NUTCH-837 before but indeed it does remove references to parts deleted in NUTCH-837. Maybe I should have done it in a separate issue.

        Also, it would be nice to have a target that sets up an Eclipse project - after this patch is applied the lib/ is nearly empty and you need to run build at least once to bring dependencies - this may be confusing.

        The jars are put in the build/lib directory so this assumes that the project has been built in order to get the dependencies. I think there are resources in Eclipse for dealing with Ivy configurations. If anyone has any pointers they will be most welcome

        Show
        jnioche Julien Nioche added a comment - I think this patch refers to some parts that were already removed in NUTCH-837 ... I applied NUTCH-837 before but indeed it does remove references to parts deleted in NUTCH-837 . Maybe I should have done it in a separate issue. Also, it would be nice to have a target that sets up an Eclipse project - after this patch is applied the lib/ is nearly empty and you need to run build at least once to bring dependencies - this may be confusing. The jars are put in the build/lib directory so this assumes that the project has been built in order to get the dependencies. I think there are resources in Eclipse for dealing with Ivy configurations. If anyone has any pointers they will be most welcome
        Hide
        jnioche Julien Nioche added a comment -

        I found http://ant.apache.org/ivy/ivyde/ which allows to manage Ivy dependencies in Eclipse.
        I had to rewrite ivy/ivy.xml to make the version numbers explicit as IvyDE was not able to load the properties in ivy/library.properties but it worked fine after that. The beauty of it is that we don't rely on the content of build/lib at all

        Show
        jnioche Julien Nioche added a comment - I found http://ant.apache.org/ivy/ivyde/ which allows to manage Ivy dependencies in Eclipse. I had to rewrite ivy/ivy.xml to make the version numbers explicit as IvyDE was not able to load the properties in ivy/library.properties but it worked fine after that. The beauty of it is that we don't rely on the content of build/lib at all
        Hide
        chrismattmann Chris A. Mattmann added a comment -

        Guys,

        Why have any libs in the lib dir at all? If we need to get the jars uploaded to Maven Central, we can do a one time upload, via the process here:

        http://maven.apache.org/guides/mini/guide-central-repository-upload.html

        Which jars do we need to get into Central? I can take the lead on it as I've got some work to do there for Tika anyways.

        Cheers,
        Chris

        Show
        chrismattmann Chris A. Mattmann added a comment - Guys, Why have any libs in the lib dir at all? If we need to get the jars uploaded to Maven Central, we can do a one time upload, via the process here: http://maven.apache.org/guides/mini/guide-central-repository-upload.html Which jars do we need to get into Central? I can take the lead on it as I've got some work to do there for Tika anyways. Cheers, Chris
        Hide
        dogacan Doğacan Güney added a comment -

        +1 to Chris. In fact, I would ask to piggyback Gora on this process for now and pushing Gora jars into this repository as well

        Show
        dogacan Doğacan Güney added a comment - +1 to Chris. In fact, I would ask to piggyback Gora on this process for now and pushing Gora jars into this repository as well
        Hide
        teabeats Piet Schrijver added a comment -

        +1 for maven, also having HBase in there would be great (=

        Show
        teabeats Piet Schrijver added a comment - +1 for maven, also having HBase in there would be great (=
        Hide
        jnioche Julien Nioche added a comment -

        @Chris : isn't this restricted to the jars we produce? I agree with Dogacan that this would be the right way to access the Gora jars

        Some of the plugins have their own lib directory as well as ivy dependencies. The plugins currently have unpublished dependencies on : automaton.jar, javaswf.jar, common-feedparser-0.6-fork.jar. Given that some of these plugins will be removed at some stage (feed & parse-swf moving to Tika) it is probably not worth bothering and we can keep them in the plugin libs.

        As for the main lib/ directory it currently contains only the native jars for Hadoop but the Gora related jars would have to go there as well unless we put them into Central as you suggested. We should probably discuss how to deal with Hadoop related resources (native jars, conf objects, scripts in bin) in a separate JIRA and whether or not we should keep them at all.

        Could you guys review the patch I sent yesterday? Since we'd decide what to do with the Hadoop native jars later and do not need the gora jars just yet it can still be applied as it is

        Show
        jnioche Julien Nioche added a comment - @Chris : isn't this restricted to the jars we produce? I agree with Dogacan that this would be the right way to access the Gora jars Some of the plugins have their own lib directory as well as ivy dependencies. The plugins currently have unpublished dependencies on : automaton.jar, javaswf.jar, common-feedparser-0.6-fork.jar. Given that some of these plugins will be removed at some stage (feed & parse-swf moving to Tika) it is probably not worth bothering and we can keep them in the plugin libs. As for the main lib/ directory it currently contains only the native jars for Hadoop but the Gora related jars would have to go there as well unless we put them into Central as you suggested. We should probably discuss how to deal with Hadoop related resources (native jars, conf objects, scripts in bin) in a separate JIRA and whether or not we should keep them at all. Could you guys review the patch I sent yesterday? Since we'd decide what to do with the Hadoop native jars later and do not need the gora jars just yet it can still be applied as it is
        Hide
        chrismattmann Chris A. Mattmann added a comment -

        Hi Julien:

        I reviewed your patch, and am +1 for you to commit it. That said, I wanted to clarify my proposal. My proposal is that we get rid of the $NUTCH/lib directory in its entirety, I wasn't so much as commenting on the plugin jars, though now that you bring it up, maybe we could integrate ivy in them as well. I don't think it would be too super hard to get the jars into Maven central. I'm reading up on the Sonatype Forge right now and might be able to get this working.

        As for Hadoop, is there any requirement that we manage the jar lib for Hadoop in Nutch? Couldn't we simply pull the jar down dynamically via ivy or via some magic in build.xml? I think if we could then we could simply remove $NUTCH/lib whic was my original intention.

        In the meanwhile, I'll create another issue to track all this, but you have my +1 to commit your patch and mark this issue as resolved. I'll link the new issue I create back to this one to indicate their relationship to one another.

        Cheers,
        Chris

        Show
        chrismattmann Chris A. Mattmann added a comment - Hi Julien: I reviewed your patch, and am +1 for you to commit it. That said, I wanted to clarify my proposal. My proposal is that we get rid of the $NUTCH/lib directory in its entirety, I wasn't so much as commenting on the plugin jars, though now that you bring it up, maybe we could integrate ivy in them as well. I don't think it would be too super hard to get the jars into Maven central. I'm reading up on the Sonatype Forge right now and might be able to get this working. As for Hadoop, is there any requirement that we manage the jar lib for Hadoop in Nutch? Couldn't we simply pull the jar down dynamically via ivy or via some magic in build.xml? I think if we could then we could simply remove $NUTCH/lib whic was my original intention. In the meanwhile, I'll create another issue to track all this, but you have my +1 to commit your patch and mark this issue as resolved. I'll link the new issue I create back to this one to indicate their relationship to one another. Cheers, Chris
        Hide
        ab Andrzej Bialecki added a comment -

        +1 for this patch for now - all good comments, there's plenty of improvements we can make, so let's line them up as separate issues.

        Show
        ab Andrzej Bialecki added a comment - +1 for this patch for now - all good comments, there's plenty of improvements we can make, so let's line them up as separate issues.
        Hide
        jnioche Julien Nioche added a comment -

        Committed revision 961306 and 961318

        Slightly improved management of plugins since the patch to make them use the common ivysettings + removed ivy/library.properties to get IvyDE to work properly

        Thanks for the comments and review of the patch

        Show
        jnioche Julien Nioche added a comment - Committed revision 961306 and 961318 Slightly improved management of plugins since the patch to make them use the common ivysettings + removed ivy/library.properties to get IvyDE to work properly Thanks for the comments and review of the patch

          People

          • Assignee:
            enis Enis Soztutar
            Reporter:
            enis Enis Soztutar
          • Votes:
            1 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development