Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-3072

Fetcher to stop QueueFeeder if aborting with "hung threads"

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.20
    • 1.21
    • fetcher
    • None

    Description

      Fetcher is shutting down if there is no progress (not a single URL fetched) during half of the MapReduce task timeout, see fetcher.thread.timeout.divisor and NUTCH-1057. Before the shut-down Fetcher reports the active FetcherThreads as "hung threads" and drops existing FetchQueues. After that the task continues with sorting and merging the spilled data. FetcherThreads and also the QueueFeeder might be still running in this moment, which opens potential concurrency issues when a FetcherThread writes output data while the output is already in the process of sorting.

      Fetcher should stop the QueueFeeder and/or make sure it isn't alive anymore. In addition, a short wait (one second) should help FetcherThreads to shut down.

      The issue was observed while testing a solution for NUTCH-3067.

      2024-10-02 09:33:18,796 INFO [main] fetcher.Fetcher: -activeThreads=120, spinWaiting=119, fetchQueues.totalSize=12000, fetchQueues.getQueueCount=9884
      2024-10-02 09:33:18,797 WARN [main] fetcher.Fetcher: Aborting with 120 hung threads.
      ...
      2024-10-02 09:33:18,828 WARN [main] fetcher.Fetcher: Aborting with 12000 queued fetch items in 9884 queues (queue feeder still alive).
      2024-10-02 09:33:18,828 DEBUG [FetcherThread] fetcher.FetcherThread: FetcherThread spin-waiting ...
      ... (reporting dropped queues)
      2024-10-02 09:33:18,903 INFO [main] fetcher.FetchItemQueues: Emptied all queues: 9279 queues with 12000 items
      2024-10-02 09:33:18,906 DEBUG [FetcherThread] fetcher.FetcherThread: FetcherThread spin-waiting ...
      2024-10-02 09:33:18,906 INFO [main] mapred.MapTask: Starting flush of map output
      2024-10-02 09:33:18,906 INFO [main] mapred.MapTask: Spilling map output
      2024-10-02 09:33:18,906 INFO [main] mapred.MapTask: bufstart = 124101175; bufend = 177094062; bufvoid = 314572800
      2024-10-02 09:33:18,906 INFO [main] mapred.MapTask: kvstart = 31025288(124101152); kvend = 30983880(123935520); length = 41409/19660800
      2024-10-02 09:33:18,907 DEBUG [FetcherThread] fetcher.FetcherThread: FetcherThread spin-waiting ...
      ...
      2024-10-02 09:33:19,292 DEBUG [FetcherThread] fetcher.FetcherThread: FetcherThread spin-waiting ...
      2024-10-02 09:33:19,294 INFO [main] mapred.Merger: Merging 4 sorted segments
      2024-10-02 09:33:19,295 INFO [main] mapred.Merger: Down to the last merge-pass, with 4 segments left of total size: 979973 bytes
      2024-10-02 09:33:19,296 DEBUG [FetcherThread] fetcher.FetcherThread: FetcherThread spin-waiting ...
      ...
      2024-10-02 09:33:19,478 DEBUG [FetcherThread] fetcher.FetcherThread: FetcherThread spin-waiting ...
      2024-10-02 09:33:19,480 DEBUG [QueueFeeder] fetcher.QueueFeeder: -feeding 12000 input urls ...
      2024-10-02 09:33:19,480 DEBUG [FetcherThread] fetcher.FetcherThread: FetcherThread spin-waiting ...
      2024-10-02 09:33:19,481 DEBUG [FetcherThread] fetcher.FetcherThread: FetcherThread spin-waiting ...
      2024-10-02 09:33:19,481 ERROR [QueueFeeder] fetcher.QueueFeeder: QueueFeeder error reading input, record 89118
      java.io.IOException: Stream closed
              at org.apache.hadoop.io.compress.DecompressorStream.checkStream(DecompressorStream.java:184)
              ...
              at org.apache.nutch.fetcher.QueueFeeder.run(QueueFeeder.java:120)
      ...
      2024-10-02 09:33:19,507 INFO [FetcherThread] fetcher.FetcherThread: FetcherThread 1285 has no more work available
      2024-10-02 09:33:19,507 INFO [FetcherThread] fetcher.FetcherThread: FetcherThread 1285 -finishing thread FetcherThread, activeThreads=104
      2024-10-02 09:33:19,507 INFO [main] mapred.Merger: Merging 4 sorted segments
      2024-10-02 09:33:19,508 INFO [FetcherThread] fetcher.FetcherThread: FetcherThread 319 has no more work available
      2024-10-02 09:33:19,508 INFO [FetcherThread] fetcher.FetcherThread: FetcherThread 319 -finishing thread FetcherThread, activeThreads=103
      2024-10-02 09:33:19,509 INFO [main] mapred.Merger: Down to the last merge-pass, with 4 segments left of total size: 959873 bytes
      2024-10-02 09:33:19,509 INFO [FetcherThread] fetcher.FetcherThread: FetcherThread 1112 has no more work available
      2024-10-02 09:33:19,509 INFO [FetcherThread] fetcher.FetcherThread: FetcherThread 1112 -finishing thread FetcherThread, activeThreads=102
      ...
      

      Later on, one of the reducer tasks failed when reading data. This caused the job to fail:

      2024-10-02 10:37:31,347 INFO mapreduce.Job:  map 100% reduce 92%
      2024-10-02 10:37:31,347 INFO mapreduce.Job: Task Id : attempt_1727715735602_0060_r_000007_1, Status : FAILED
      Error: java.lang.ArrayIndexOutOfBoundsException: Index 114 out of bounds for length 25
              at org.apache.nutch.util.GenericWritableConfigurable.readFields(GenericWritableConfigurable.java:49)
      ...
      2024-10-02 11:08:34,260 ERROR fetcher.Fetcher: Fetcher: java.lang.RuntimeException: Fetcher job did not succeed, job id: job_1727715735602_0060, job status: FAILED, reason: Task failed task_1727715735602_0060_r_000007
      Job failed as tasks failed. failedMaps:0 failedReduces:1 killedMaps:0 killedReduces: 0
      

      Attachments

        Issue Links

          Activity

            People

              snagel Sebastian Nagel
              snagel Sebastian Nagel
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: