Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2776

Fetcher to temporarily deduplicate followed redirects

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Implemented
    • 1.16
    • 1.17
    • fetcher
    • None
    • Patch Available

    Description

      If fetcher follows redirect (http.redirect.max > 0), it may happen that many redirects of a site point to the same URL. In this situation, it might be good if fetcher could temporarily (for a configurable time period) deduplicate the redirect targets and skip all redirects except the first one. Typical examples of duplicated redirect targets are:

      • instead of responding with HTTP status 404:
        /
        /resource-not-found
        /search/
        /404
        /error/not-found
        /err/notfound.html
      • a page to accept/decline cookies
        /cookie_usage.php
        

      Attachments

        Issue Links

          Activity

            People

              snagel Sebastian Nagel
              snagel Sebastian Nagel
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: