1. Nutch
  2. NUTCH-422

index-extra plugin creates additional fields in the index, based on configurable logic


    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Duplicate
    • Affects Version/s: 0.8.1
    • Fix Version/s: 1.5
    • Component/s: indexer
    • Labels:
    • Environment:

      All environments


      Extract from the Readme file:

      A. Introduction

      The index-extra plugin allows you to configure additional fields that you wish to be added to the index, based on one of the following sources:

      • The parsed text
      • Meta data fields
      • Previously created document-to-be-indexed fields
      • Plain constant string
      • Java expression combining one or more of the above, and resolving to a string
        A regex can also be applied to any of the above, allowing fields to be created based on patterns extracted from the source.

      B. Installation

      1) Binaries only: Copy the 'index-extra' folder within to NUTCHDIR/build
      Copy the 'index-extra-conf.xml' file to NUTCHDIR/conf, and configure
      Enable the plugin by updating the nutch-site.xml file
      2) Source code: Always refer to the Nutch wiki for detailed instructions on building Nutch. In short:
      Copy the 'index-extra' folder within to NUTCHDIR/src/plugin
      Update the build.xml in NUTCHDIR/src/plugin to include plugin
      Update the NUTCHDIR/ file to include plugin
      run ant to build
      Copy the 'index-extra-conf.xml' file to NUTCHDIR/conf, and configure
      Enable the plugin by updating the nutch-site.xml file

      C. Known Issues

      1) For this plugin to work correctly on any document field, it is necessary to run the other index filters
      first, so that all basic document fields are generated first. To do this, configure the indexingfilter.order
      property. (Please see patch NUTCH-421 to enable indexingfilter.order property. If this patch is not applied,
      the plugin will still work, but will not be able to use document fields created by other index filter plugins.)

      2) At this stage, field boost can not be used as Nutch scoring overrides the field boost with its own
      document-level boost calculation. This occurs at the end of org.apache.nutch.indexer.Indexer's reduce method.

        11 kB
        22 kB
        Alan Tanaman
        319 kB
        Alan Tanaman

        Issue Links


          Alan Tanaman created issue -
          Alan Tanaman made changes -
          Field Original Value New Value
          Attachment [ 12348013 ]
          Attachment [ 12348012 ]
          Sami Siren made changes -
          Assignee Sami Siren [ siren ]
          garpinc made changes -
          Attachment [ 12438868 ]
          Julien Nioche made changes -
          Link This issue relates to NUTCH-809 [ NUTCH-809 ]
          Lewis John McGibbney made changes -
          Link This issue is related to NUTCH-809 [ NUTCH-809 ]
          Julien Nioche made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Fix Version/s 1.5 [ 12318246 ]
          Resolution Duplicate [ 3 ]
          Lewis John McGibbney made changes -
          Status Resolved [ 5 ] Closed [ 6 ]


            • Assignee:
              Sami Siren
              Alan Tanaman
            • Votes:
              4 Vote for this issue
              7 Start watching this issue


              • Created: