Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2395

Cannot run job worker! - error while running multiple crawling jobs in parallel

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Auto Closed
    • 2.3.1
    • 2.5
    • generator, nutch server
    • None
    • Ubuntu 16.04 64-bit
      Oracle Java 8 64-bit
      Nutch 2.3.1 (standalone deployment)
      MongoDB 3.4

    Description

      Cannot run job worker! - error while running multiple crawling jobs in parallel

      Ubuntu 16.04 64-bit
      Oracle Java 8 64-bit
      Nutch 2.3.1 (standalone deployment)
      MongoDB 3.4

      My application is trying to execute multiple Nutch jobs in parallel using Nutch REST services. The application injects a seed URL and then repeats GENERATE/FETCH/PARSE/UPDATEDB sequence requested number of times to emulated continuous crawling (each step in the sequence is executed upon successful competition of the previous step then the whole sequence is repeated again). Here is a brief description of the jobs:

      • Number of parallel jobs: 7
      • Each job has unique crawl id and MongoDB collection
      • Seed URL for all jobs: http://www.cnn.com
      • Regex URL filters for all jobs:
        • "-^.{1000,}$" - exclude very long URLs
        • "+." - include the rest

      The jobs are started as expected but at some point some of them fail with "Cannot run job worker!" error. For more details see job status and hadoop.log lines below.

      In debugger during crash I noticed that a single instance of SelectorEntryComparator (definition is nested in GeneratorJob) is shared across multiple reducer tasks. The class is inherited from org.apache.hadoop.io.WritableComparator which has a few members unprotected for concurrent usage. At some point multiple threads may access those members in WritableComparator.compare call. I modified SelectorEntryComparator and it seems solved the problem but I am not sure if the change is appropriate and/or sufficient (covers GENERATE only?)

      Original code:

      public static class SelectorEntryComparator extends WritableComparator {
          public SelectorEntryComparator() {
            super(SelectorEntry.class, true);
          }
      }
      

      Modified code:

      public static class SelectorEntryComparator extends WritableComparator {
          public SelectorEntryComparator() {
            super(SelectorEntry.class, true);
          }
          
          @Override
          synchronized public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {
          	return super.compare(b1, s1, l1, b2, s2, l2);
          }    
      }
      

      Example of failed job status:

      {
      "id" : "parallel_0-65ff2f1b-382e-4eb2-a813-a0370b84d5b6-GENERATE-1961495833",
      "type" : "GENERATE",
      "confId" : "65ff2f1b-382e-4eb2-a813-a0370b84d5b6",
      "args" : { "topN" : "100" },
      "result" : null,
      "state" : "FAILED",
      "msg" : "ERROR: java.lang.RuntimeException: job failed: name=[parallel_0]generate: 1498059912-1448058551, jobid=job_local1142434549_0036",
      "crawlId" : "parallel_0"
      }
      

      Lines from hadoop.log

      2017-06-21 11:45:13,021 WARN  mapred.LocalJobRunner - job_local1142434549_0036
      java.lang.Exception: java.lang.RuntimeException: java.io.EOFException
                      at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
                      at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
      Caused by: java.lang.RuntimeException: java.io.EOFException
                      at org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:164)
                      at org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKeyValue(ReduceContextImpl.java:158)
                      at org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKey(ReduceContextImpl.java:121)
                      at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.nextKey(WrappedReducer.java:302)
                      at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:170)
                      at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
                      at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
                      at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
                      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
                      at java.util.concurrent.FutureTask.run(FutureTask.java:266)
                      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
                      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
                      at java.lang.Thread.run(Thread.java:745)
      Caused by: java.io.EOFException
                      at java.io.DataInputStream.readFully(DataInputStream.java:197)
                      at org.apache.hadoop.io.Text.readString(Text.java:466)
                      at org.apache.hadoop.io.Text.readString(Text.java:457)
                      at org.apache.nutch.crawl.GeneratorJob$SelectorEntry.readFields(GeneratorJob.java:92)
                      at org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:158)
                      ... 12 more
      2017-06-21 11:45:13,058 WARN  mapred.LocalJobRunner - job_local1976432650_0038
      java.lang.Exception: java.lang.RuntimeException: java.io.EOFException
                      at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
                      at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
      Caused by: java.lang.RuntimeException: java.io.EOFException
                      at org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:164)
                      at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.compare(MapTask.java:1245)
                      at org.apache.hadoop.util.QuickSort.sortInternal(QuickSort.java:99)
                      at org.apache.hadoop.util.QuickSort.sortInternal(QuickSort.java:126)
                      at org.apache.hadoop.util.QuickSort.sort(QuickSort.java:63)
                      at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1575)
                      at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1462)
                      at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:700)
                      at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:770)
                      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
                      at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
                      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
                      at java.util.concurrent.FutureTask.run(FutureTask.java:266)
                      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
                      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
                      at java.lang.Thread.run(Thread.java:745)
      Caused by: java.io.EOFException
                      at java.io.DataInputStream.readByte(DataInputStream.java:267)
                      at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:308)
                      at org.apache.hadoop.io.WritableUtils.readVIntInRange(WritableUtils.java:348)
                      at org.apache.hadoop.io.Text.readString(Text.java:464)
                      at org.apache.hadoop.io.Text.readString(Text.java:457)
                      at org.apache.nutch.crawl.GeneratorJob$SelectorEntry.readFields(GeneratorJob.java:92)
                      at org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:158)
                      ... 15 more
      
      2017-06-21 11:45:13,372 ERROR impl.JobWorker - Cannot run job worker!
      java.lang.RuntimeException: job failed: name=[parallel_0]generate: 1498059912-1448058551, jobid=job_local1142434549_0036
                      at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:120)
                      at org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:227)
                      at org.apache.nutch.api.impl.JobWorker.run(JobWorker.java:64)
                      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
                      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
                      at java.lang.Thread.run(Thread.java:745)
      
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              Vyacheslav Vyacheslav Pascarel
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: