Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2125

Metrics tool for relevancy

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.10
    • None
    • tool

    Description

      Purpose: a metric for determining if the “relevancy” of a crawl after each round and the “relevancy” of a page. NB: this is not a scoring plugin. By default, the first 25 terms will be stored.

      • Return the topN terms per a page
      • Return the topN terms per a segment based on tf-idf
      • Leverage Apache Lucene libs

      Attachments

        Activity

          People

            Unassigned Unassigned
            kwhitehall Kim Whitehall
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: