Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-1430

Freegenerator records overwrite CrawlDB records with AdaptiveFetchSchedule

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 1.5
    • 1.8
    • crawldb
    • None
    • Patch Available

    Description

      Steps to reproduce:

      Without AdaptiveFetchSchedule:

      $ bin/nutch readdb crawl/crawldb/ -url http://www.openindex.io/en/home.html
      URL: http://www.openindex.io/en/home.html
      Version: 7
      Status: 2 (db_fetched)
      Fetch time: Thu Aug 16 13:58:23 CEST 2012
      Modified time: Thu Jan 01 01:00:00 CET 1970
      Retries since fetch: 0
      Retry interval: 2592000 seconds (30 days)
      Score: 0.0
      Signature: c2601ca503f2fc5edcb286501d7fb271
      Metadata: Content-Type: text/html_pst_: success(1), lastModified=0
      

      With AdaptiveFetchSchedule:

      $ bin/nutch readdb crawl/crawldb/ -url http://www.openindex.io/en/home.html
      URL: http://www.openindex.io/en/home.html
      Version: 7
      Status: 2 (db_fetched)
      Fetch time: Tue Jul 17 13:56:33 CEST 2012
      Modified time: Tue Jul 17 13:55:33 CEST 2012
      Retries since fetch: 0
      Retry interval: 60 seconds (0 days)
      Score: 0.0
      Signature: 23567bb52ee8b905b8649c4305ed82ee
      Metadata: Content-Type: text/html_pst_: success(1), lastModified=0
      

      Attachments

        1. NUTCH-1430-1.6-1.patch
          0.7 kB
          Markus Jelsma
        2. NUTCH-1430-1.6-2.patch
          2 kB
          Markus Jelsma

        Activity

          People

            markus17 Markus Jelsma
            markus17 Markus Jelsma
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: