Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2297

CrawlDbReader -stats wrong values for earliest fetch time and shortest interval

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 1.13
    • Fix Version/s: 1.14
    • Component/s: crawldb
    • Labels:
      None

      Description

      NUTCH-2286 added min, max and average for fetch interval and fetch time.
      When running in distributed mode (not reproducible in local mode), the values for the minimum (earliest fetch time and shortest fetch interval) may be wrong with implausible values:

      TOTAL urls: 7180518032
       shortest fetch interval:    175 days, 00:00:00             <<<<<< ????
       avg fetch interval: 10 days, 08:01:36
       longest fetch interval:     15 days, 18:00:00
       earliest fetch time:        Thu Dec 20 05:30:00 UTC 3106   <<<<<< ????
       avg of fetch times: Fri Feb 19 00:07:00 UTC 2016
       latest fetch time:  Mon Jul 18 05:22:00 UTC 2016
       retry 0:    6907984913
       retry 1:    148125397
       retry 2:    82761892
       retry 3:    41645830
       min score:  0.0
       avg score:  0.014360981
       max score:  9.25
       ...
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                snagel Sebastian Nagel
                Reporter:
                snagel Sebastian Nagel
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: