Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-1890

Major Typo in Documentation for Integrating Nutch and Solr

    XMLWordPrintableJSON

Details

    Description

      Problematic Page: https://wiki.apache.org/nutch/NutchTutorial

      1. Duplicated Text
      In section "6. Integrate Solr with Nutch" the following line is asked to be commented from:
      <!-- <filter class="solr.
      EnglishPorterFilterFactory" protected="protwords.txt"/> -->

      to

      <!-- <filter class="solr.
      EnglishPorterFilterFactory" protected="protwords.txt"/> -->

      but I think it should rather read from:
      <filter class="solr.
      EnglishPorterFilterFactory" protected="protwords.txt"/>

      to

      <!-- <filter class="solr.
      EnglishPorterFilterFactory" protected="protwords.txt"/> -->

      2. Addition of extra step
      After going through the recommended steps in Section 6 to integrate with solr, I got an error. The error read 'field text not defined'. This error is so because apparently in my solrconfig.xml, I had defined 'text' as my default field but it was not defined the schema.xml that I imported from the nutch conf folder.

      I propose that either the schema.xml in the nutch conf folder be shipped with the 'text' field already defined or an extra step be added to Section 6 that reads:
      Add the following line under the definition of 'content' field:
      <field name="text" type="text" stored="true" indexed="true"/>
      or better till steps be added to allow the user to change the default field in solrconfig.xml from 'text' to 'content' whichever solution seems the most appropriate.

      Attachments

        Activity

          People

            chrismattmann Chris A. Mattmann
            ekowcharles Boadu Akoto Charles Jnr
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 1h
                1h
                Remaining:
                Remaining Estimate - 1h
                1h
                Logged:
                Time Spent - Not Specified
                Not Specified