Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2511

SitemapProcessor limited by http.content.limit

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.14
    • 1.17
    • sitemap
    • None

    Description

      Because SitemapProcessor uses the HTTP protocol plugin, which limits the size of a response to http.content.limit (64KB by default), it can only handle sitemaps smaller than that size. 

      I don't believe that is the intent of the users by setting http.content.limit - they want to limit document size, not sitemap size. The spec specifically says that sitemaps can be up to 50MB.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              yossi Yossi Tamari
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: