Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2493

Add configuration parameter for sitemap processing to crawler script

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.15
    • Component/s: None
    • Labels:
      None

      Description

      While using the crawler script with the sitemap processing feature introduced in NUTCH-2491 I encountered some performance issues when working with large sitemaps.
      Therefore one should be able to specify if sitemap processing based on HostDB should take place and if so how frequently it should be done.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                mfeltscher Moreno Feltscher
                Reporter:
                mfeltscher Moreno Feltscher
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: