Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-1149

DomainStats should process numeric CrawlDB metadata

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Trivial
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Right now the DomainStats program only outputs the sum of fetched records per domain or host. It should also be able to output processed numerics of meta data in order to get the average size (content length) for a given domain or host. This is also useful for generating a metric for adult material (by domain or host) when using a plugin that stores a propability factor of adult material per URL in the Crawl DB.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                markus17 Markus Jelsma
                Reporter:
                markus17 Markus Jelsma
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: