Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-1385

More robust plug-in order properties in "nutch-site.xml"

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 1.5
    • 1.6
    • indexer, parser
    • Patch Available

    Description

      When listing multiple scoring filters in certain properties (listed below) in "nutch-site.xml", it is vital that no spaces/newlines/tabs are placed in front of the value content.

      E.g.:
      This is fine:
      <value>org.apache.nutch.scoring.opic.OPICScoringFilter myFilter</value>

      Either of these will generate an exception:
      <value> org.apache.nutch.scoring.opic.OPICScoringFilter myFilter</value>
      <value>
      org.apache.nutch.scoring.opic.OPICScoringFilter
      myFilter
      </value>

      Affects these properties in "nutch-site.xml":

      • indexingfilter.order
      • urlnormalizer.order
      • urlfilter.order
      • htmlparsefilter.order
      • scoring.filter.order

      Solution: replaced

      {order.split("\\s+")}

      to

      {order.trim().split("\\s+")}

      . Patch provided.

      Attachments

        1. nutch-1385.txt
          3 kB
          Andy Xue

        Activity

          People

            markus17 Markus Jelsma
            andyxueyuan Andy Xue
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: