Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-421

Allow predeterminate running order of index filters

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 0.8.1
    • 0.9.0
    • indexer
    • None
    • All

    Description

      I've tested a patch for org.apache.nutch.indexer.IndexingFilters, allowing the user to state in which order the indexing filters are to be run based on a new
      indexingfilter.order property. This is needed when a filter needs to rely on previously generated document fields as a source of input to generate further fields.

      As suggested elsewhere, I based this on the urlfilter.order functionality:

      <property>
      <name>indexingfilter.order</name>
      <value>org.apache.nutch.indexer.basic.BasicIndexingFilter org.apache.nutch.indexer.more.MoreIndexingFilter</value>
      <description>The order by which index filters are applied.
      If empty, all available index filters (as dictated by properties
      plugin-includes and plugin-excludes above) are loaded and applied in system
      defined order. If not empty, only named filters are loaded and applied
      in given order. For example, if this property has value:
      org.apache.nutch.indexer.basic.BasicIndexingFilter org.apache.nutch.indexer.more.MoreIndexingFilter
      then BasicIndexingFilter is applied first, and MoreIndexingFilter second.
      Since all filters are AND'ed, filter ordering does not have impact
      on end result, but it may have performance implication, depending
      on relative expensiveness of filters.
      </description>
      </property>

      Attachments

        1. nutch-421.patch
          66 kB
          Alan Tanaman

        Activity

          People

            siren Sami Siren
            alantanaman Alan Tanaman
            Votes:
            1 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: