Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2929

Fetcher: start threads slowly to avoid that resources are temporarily exhausted

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Implemented
    • 1.18
    • 1.19
    • None
    • None
    • Patch Available

    Description

      Fetcher spins all threads without any delay. This may cause that certain resources are temporarily exhausted if all threads start fetching the first pages simultaneously.

      The issue has been observed by Tika warnings about overuse of the SAXParser pool which appeared only during the first 2-5 minutes of fetching a segment. See https://lists.apache.org/thread/lo6b9wdlxy2lz12wmosldgl9x9ov1cks - adding a short delay between thread launches makes the warnings disappear.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              snagel Sebastian Nagel
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: