Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-1985

Adding a main() method to the MimeTypeIndexingFilter

    XMLWordPrintableJSON

    Details

    • Patch Info:
      Patch Available
    • Flags:
      Patch

      Description

      This make very easy the testing of different rules files to check the expressions used to filter the content based on the MIME type detected. Until now the only way to check this was to do test crawls and check the stored data in Solr/Elasticsearch.

      This allows calling the file using the bin/nutch plugin command, something like:

      bin/nutch plugin mimetype-filter org.apache.nutch.indexer.filter.MimeTypeIndexingFilter -h

      Two options are accepted, -h, --help for showing the help and -rules for specifying a rules file to be used, this makes easy to play with different rules file until you get the desired behavior.

      After invoking the class, a valid MIME type must be entered for each line, and the output will be the same MIME type with a + or - sign in the beginning, indicating if the given MIME type is allowed or denied respectively.

        Attachments

        1. NUTCH-1985.patch
          5 kB
          Jorge Luis Betancourt Gonzalez

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              jorgelbg Jorge Luis Betancourt Gonzalez
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: