Description
Fetcher is shutting down if there is no progress (not a single URL fetched) during half of the MapReduce task timeout, see fetcher.thread.timeout.divisor and NUTCH-1057. Before the shut-down Fetcher reports the active FetcherThreads as "hung threads" and drops existing FetchQueues. After that the task continues with sorting and merging the spilled data. FetcherThreads and also the QueueFeeder might be still running in this moment, which opens potential concurrency issues when a FetcherThread writes output data while the output is already in the process of sorting.
Fetcher should stop the QueueFeeder and/or make sure it isn't alive anymore. In addition, a short wait (one second) should help FetcherThreads to shut down.
The issue was observed while testing a solution for NUTCH-3067.
2024-10-02 09:33:18,796 INFO [main] fetcher.Fetcher: -activeThreads=120, spinWaiting=119, fetchQueues.totalSize=12000, fetchQueues.getQueueCount=9884 2024-10-02 09:33:18,797 WARN [main] fetcher.Fetcher: Aborting with 120 hung threads. ... 2024-10-02 09:33:18,828 WARN [main] fetcher.Fetcher: Aborting with 12000 queued fetch items in 9884 queues (queue feeder still alive). 2024-10-02 09:33:18,828 DEBUG [FetcherThread] fetcher.FetcherThread: FetcherThread spin-waiting ... ... (reporting dropped queues) 2024-10-02 09:33:18,903 INFO [main] fetcher.FetchItemQueues: Emptied all queues: 9279 queues with 12000 items 2024-10-02 09:33:18,906 DEBUG [FetcherThread] fetcher.FetcherThread: FetcherThread spin-waiting ... 2024-10-02 09:33:18,906 INFO [main] mapred.MapTask: Starting flush of map output 2024-10-02 09:33:18,906 INFO [main] mapred.MapTask: Spilling map output 2024-10-02 09:33:18,906 INFO [main] mapred.MapTask: bufstart = 124101175; bufend = 177094062; bufvoid = 314572800 2024-10-02 09:33:18,906 INFO [main] mapred.MapTask: kvstart = 31025288(124101152); kvend = 30983880(123935520); length = 41409/19660800 2024-10-02 09:33:18,907 DEBUG [FetcherThread] fetcher.FetcherThread: FetcherThread spin-waiting ... ... 2024-10-02 09:33:19,292 DEBUG [FetcherThread] fetcher.FetcherThread: FetcherThread spin-waiting ... 2024-10-02 09:33:19,294 INFO [main] mapred.Merger: Merging 4 sorted segments 2024-10-02 09:33:19,295 INFO [main] mapred.Merger: Down to the last merge-pass, with 4 segments left of total size: 979973 bytes 2024-10-02 09:33:19,296 DEBUG [FetcherThread] fetcher.FetcherThread: FetcherThread spin-waiting ... ... 2024-10-02 09:33:19,478 DEBUG [FetcherThread] fetcher.FetcherThread: FetcherThread spin-waiting ... 2024-10-02 09:33:19,480 DEBUG [QueueFeeder] fetcher.QueueFeeder: -feeding 12000 input urls ... 2024-10-02 09:33:19,480 DEBUG [FetcherThread] fetcher.FetcherThread: FetcherThread spin-waiting ... 2024-10-02 09:33:19,481 DEBUG [FetcherThread] fetcher.FetcherThread: FetcherThread spin-waiting ... 2024-10-02 09:33:19,481 ERROR [QueueFeeder] fetcher.QueueFeeder: QueueFeeder error reading input, record 89118 java.io.IOException: Stream closed at org.apache.hadoop.io.compress.DecompressorStream.checkStream(DecompressorStream.java:184) ... at org.apache.nutch.fetcher.QueueFeeder.run(QueueFeeder.java:120) ... 2024-10-02 09:33:19,507 INFO [FetcherThread] fetcher.FetcherThread: FetcherThread 1285 has no more work available 2024-10-02 09:33:19,507 INFO [FetcherThread] fetcher.FetcherThread: FetcherThread 1285 -finishing thread FetcherThread, activeThreads=104 2024-10-02 09:33:19,507 INFO [main] mapred.Merger: Merging 4 sorted segments 2024-10-02 09:33:19,508 INFO [FetcherThread] fetcher.FetcherThread: FetcherThread 319 has no more work available 2024-10-02 09:33:19,508 INFO [FetcherThread] fetcher.FetcherThread: FetcherThread 319 -finishing thread FetcherThread, activeThreads=103 2024-10-02 09:33:19,509 INFO [main] mapred.Merger: Down to the last merge-pass, with 4 segments left of total size: 959873 bytes 2024-10-02 09:33:19,509 INFO [FetcherThread] fetcher.FetcherThread: FetcherThread 1112 has no more work available 2024-10-02 09:33:19,509 INFO [FetcherThread] fetcher.FetcherThread: FetcherThread 1112 -finishing thread FetcherThread, activeThreads=102 ...
Later on, one of the reducer tasks failed when reading data. This caused the job to fail:
2024-10-02 10:37:31,347 INFO mapreduce.Job: map 100% reduce 92% 2024-10-02 10:37:31,347 INFO mapreduce.Job: Task Id : attempt_1727715735602_0060_r_000007_1, Status : FAILED Error: java.lang.ArrayIndexOutOfBoundsException: Index 114 out of bounds for length 25 at org.apache.nutch.util.GenericWritableConfigurable.readFields(GenericWritableConfigurable.java:49) ... 2024-10-02 11:08:34,260 ERROR fetcher.Fetcher: Fetcher: java.lang.RuntimeException: Fetcher job did not succeed, job id: job_1727715735602_0060, job status: FAILED, reason: Task failed task_1727715735602_0060_r_000007 Job failed as tasks failed. failedMaps:0 failedReduces:1 killedMaps:0 killedReduces: 0
Attachments
Issue Links
- links to