Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-1403

Add default ScoringFilter for manipulating metadata

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Implemented
    • None
    • 1.19
    • None
    • None

    Description

      This is currently done by the urlmeta plugin, which has too vague a name and a redundant indexing filter now that we have the index-metadata plugin. This scoring filter would help defining which metadata to pass from :

      • the crawl metadata to the content metadata
      • the content metadata to the parse metadata
      • the parse metadata to the crawldatum for the outlinks
        I'd make this scoring filter available by default i.e. not in a separate plugin as its functionalities are commonly used.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              jnioche Julien Nioche
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: