Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-3058

Fetcher: counter for hung threads

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Implemented
    • 1.20
    • 1.21
    • fetcher
    • None
    • Patch Available

    Description

      The Fetcher class defines a "hard" timeout defined as 50% of the MapReduce task timeout, see mapreduce.task.timeout and fetcher.threads.timeout.divisor. If there are fetcher threads running but without any progress during the timeout period (in terms of newly started fetch items), Fetcher is shut down to avoid that the task timeout is reached and the fetcher job is failed. The "hung threads" are logged together with the URL being fetched and (DEBUG level) the Java stack.

      In addition to logging, a job counter should indicate the number of hung threads. This would allow to see on the job level whether there are issues with hung threads. To trace the issues it's still required to look into the Hadoop task logs.

      Attachments

        Issue Links

          Activity

            People

              snagel Sebastian Nagel
              snagel Sebastian Nagel
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: