Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-1385

More robust plug-in order properties in "nutch-site.xml"

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 1.5
    • Fix Version/s: 1.6
    • Component/s: indexer, parser
    • Labels:
    • Patch Info:
      Patch Available

      Description

      When listing multiple scoring filters in certain properties (listed below) in "nutch-site.xml", it is vital that no spaces/newlines/tabs are placed in front of the value content.

      E.g.:
      This is fine:
      <value>org.apache.nutch.scoring.opic.OPICScoringFilter myFilter</value>

      Either of these will generate an exception:
      <value> org.apache.nutch.scoring.opic.OPICScoringFilter myFilter</value>
      <value>
      org.apache.nutch.scoring.opic.OPICScoringFilter
      myFilter
      </value>

      Affects these properties in "nutch-site.xml":

      • indexingfilter.order
      • urlnormalizer.order
      • urlfilter.order
      • htmlparsefilter.order
      • scoring.filter.order

      Solution: replaced

      {order.split("\\s+")}

      to

      {order.trim().split("\\s+")}

      . Patch provided.

        Attachments

        1. nutch-1385.txt
          3 kB
          Andy Xue

          Activity

            People

            • Assignee:
              markus17 Markus Jelsma
              Reporter:
              andyxueyuan Andy Xue
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: