Nutch
  1. Nutch
  2. NUTCH-717

Make Nutch Solr integration easier

    Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Critical Critical
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 1.9
    • Component/s: None
    • Labels:
      None

      Description

      Erik Hatcher proposed we should provide a full solr config dir to be used with Nutch-Solr. Now we only provide index schema. It would be considerably easier to setup nutch-solr if we provided the whole conf dir that you could use with solr like:

      java -Dsolr.solr.home=<Nutch's Solr Home> -jar start.jar

        Issue Links

          Activity

          Hide
          Markus Jelsma added a comment -

          20120304-push-1.6

          Show
          Markus Jelsma added a comment - 20120304-push-1.6
          Hide
          Lewis John McGibbney added a comment -

          Are we to provide any support for users wishing to use Solr within a container such as Tomcat? e.g. is it going to be necessary/required for us to ship a .WAR file to incorporate suggestions here? Personally I first began using a solr.war with Tomcat due to the production environment I was in and the requirement to monitor and run everything through Tomcat, however I now find using Solr independently inside Jetty as a more suitable option. What are the thoughts here?

          Regarding your comment Julien, I am very much in favour of making a Solr indexing backend pluggable. It would establish a nice structure/precedence for any future options we wish to support as stated by Markus.

          This is on the Radar for both 1.4 and 2.0 though... what and where are the differences? I think we can only begin to make progress with this when both incorporated issues as above are resolved.

          Show
          Lewis John McGibbney added a comment - Are we to provide any support for users wishing to use Solr within a container such as Tomcat? e.g. is it going to be necessary/required for us to ship a .WAR file to incorporate suggestions here? Personally I first began using a solr.war with Tomcat due to the production environment I was in and the requirement to monitor and run everything through Tomcat, however I now find using Solr independently inside Jetty as a more suitable option. What are the thoughts here? Regarding your comment Julien, I am very much in favour of making a Solr indexing backend pluggable. It would establish a nice structure/precedence for any future options we wish to support as stated by Markus. This is on the Radar for both 1.4 and 2.0 though... what and where are the differences? I think we can only begin to make progress with this when both incorporated issues as above are resolved.
          Hide
          Markus Jelsma added a comment -

          Makes sense indeed! Same would be true for ES, bundle it in the plugi-to-be-made.

          Show
          Markus Jelsma added a comment - Makes sense indeed! Same would be true for ES, bundle it in the plugi-to-be-made.
          Hide
          Julien Nioche added a comment -

          Maybe we could make the indexing backends pluggable first and move the SOLR-related stuff to a new plugin? The plugin would have a custom task (e.g. startSOLR) as you described but this would not affect the common build.xml + the various config files would be kept separated from the content of the main conf dir. Makes sense?

          Show
          Julien Nioche added a comment - Maybe we could make the indexing backends pluggable first and move the SOLR-related stuff to a new plugin? The plugin would have a custom task (e.g. startSOLR) as you described but this would not affect the common build.xml + the various config files would be kept separated from the content of the main conf dir. Makes sense?
          Hide
          Markus Jelsma added a comment -

          We can add a Solr instance with Jetty and deploy it in the runtime directory. If a user can simply go to runtime/solr directory and run with java -jar start.jar it greatly reduces the hassle for new users. We can then also move our schema.xml to the proper location.

          Show
          Markus Jelsma added a comment - We can add a Solr instance with Jetty and deploy it in the runtime directory. If a user can simply go to runtime/solr directory and run with java -jar start.jar it greatly reduces the hassle for new users. We can then also move our schema.xml to the proper location.
          Hide
          Eric Pugh added a comment -

          After having had a chance to work through the updated tutorial: http://wiki.apache.org/nutch/RunningNutchAndSolr#A4._Setup_Solr_for_search I think the Solr step is a bit awkward as well. One of the reasons Solr has seen great adoption is that the /example app is so well thought out, and easy to get started.

          With Nutch being decoupled from Solr, but depending on Solr, I wonder if there is an issue of a user downloading nutch, and then downloading Solr, and the versions being out of whack? Like if Nutch depends on Solr 3.3, but Solr 4 has been released.

          I could see taking a version of Solr and checking it in. Strip it down to JUST what Nutch needs, and then you could include in the Nutch version a nice Velocity based UI for just browsing through the data that Nutch returns.

          Show
          Eric Pugh added a comment - After having had a chance to work through the updated tutorial: http://wiki.apache.org/nutch/RunningNutchAndSolr#A4._Setup_Solr_for_search I think the Solr step is a bit awkward as well. One of the reasons Solr has seen great adoption is that the /example app is so well thought out, and easy to get started. With Nutch being decoupled from Solr, but depending on Solr, I wonder if there is an issue of a user downloading nutch, and then downloading Solr, and the versions being out of whack? Like if Nutch depends on Solr 3.3, but Solr 4 has been released. I could see taking a version of Solr and checking it in. Strip it down to JUST what Nutch needs, and then you could include in the Nutch version a nice Velocity based UI for just browsing through the data that Nutch returns.
          Hide
          Markus Jelsma added a comment -

          Back on the radar for 2.0?

          Show
          Markus Jelsma added a comment - Back on the radar for 2.0?
          Hide
          Chris A. Mattmann added a comment -
          Show
          Chris A. Mattmann added a comment - pushing this out per http://bit.ly/c7tBv9
          Hide
          Doğacan Güney added a comment -

          +1 from me too.

          I have another proposition: Would it make sense to add full solr jar (and webapp) to our code base and make something like this work?

          bin/nutch solrserver

          This would make integration much easier, but may make nutch size very big.

          Show
          Doğacan Güney added a comment - +1 from me too. I have another proposition: Would it make sense to add full solr jar (and webapp) to our code base and make something like this work? bin/nutch solrserver This would make integration much easier, but may make nutch size very big.
          Hide
          Alex McLintock added a comment -

          +1

          I've just tried to integrate solr and Nutch following a fairly clear explanation. However I failed to get it working and have no obvious errors in the log files to tell me what went wrong. Anything which can be done to simplify this process would be helpful.

          Show
          Alex McLintock added a comment - +1 I've just tried to integrate solr and Nutch following a fairly clear explanation. However I failed to get it working and have no obvious errors in the log files to tell me what went wrong. Anything which can be done to simplify this process would be helpful.

            People

            • Assignee:
              Unassigned
              Reporter:
              Sami Siren
            • Votes:
              2 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:

                Development