Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2509

Inconsistent behavior in SitemapProcessor

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 1.14
    • 1.15
    • sitemap
    • None
    • Patch Available

    Description

      There are two inconsistent behaviors in SitemapProcessor:

      1. There is a member variable maxRedir that is supposed to limit the number of redirections on sitemap URLs, and it is initialized from config property sitemap.redir.max, but it is ignored in the code because a local variable with the same name is defined in the relevant method, and is always set to 3.
      2. When a sitemap URL goes through redirect, it is filtered and normalized. However, if a sitemap URL comes from a sitemapindex, it is not. This seems inconsistent, as in both cases we have a URL from an outside source.

      Attachments

        1. SitemapProcessor.patch
          1 kB
          Yossi Tamari

        Issue Links

          Activity

            People

              Unassigned Unassigned
              yossi Yossi Tamari
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: