Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-353

pages that serverside forwards will be refetched every time

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.8.1, 0.9.0
    • 1.0.0
    • None
    • None

    Description

      Pages that do a serverside forward are not written with a status change back into the crawlDb. Also the nextFetchTime is not changed.
      This causes a refetch of the same page again and again. The result is nutch is not polite and refetching the forwarding and target page in each segment iteration. Also it effects the scoring since the forward page contribute it's score to all outlinks.

      Attachments

        1. doNotRefecthForwarderPagesV1.patch
          0.7 kB
          Stefan Groschupf

        Issue Links

          Activity

            People

              ab Andrzej Bialecki
              joa23 Stefan Groschupf
              Votes:
              3 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: