Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2039

Relevance based scoring filter

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.11
    • Component/s: None
    • Labels:

      Description

      A ScoringFilter plugin that uses a similarity measure to calculate the similarity between a given page(gold standard) and the currently parsed page. The score obtained from this similarity is then distributed to its outlinks. This filter aims to focus the crawler to crawl/explore relevant pages.

        Attachments

          Activity

            People

            • Assignee:
              chrismattmann Chris A. Mattmann
              Reporter:
              sujenshah Sujen Shah
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: