Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-133

Getting errors in reading the output files of a map/reduce job immediately after the job is complete

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Cannot Reproduce
    • None
    • None
    • None
    • None

    Description

      I have an app that fire up map/reduce jobs sequentially. The output of one job if the input of the next.
      I observe that many map tasks failed due to file read errors:

      java.rmi.RemoteException: java.io.IOException: Cannot open filename /user/runping/runping/docs_store/stage_2/base_docs/part-00186 at org.apache.hadoop.dfs.NameNode.open(NameNode.java:130) at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:237) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:216) at org.apache.hadoop.ipc.Client.call(Client.java:303) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:141) at org.apache.hadoop.dfs.$Proxy1.open(Unknown Source) at org.apache.hadoop.dfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:315) at org.apache.hadoop.dfs.DFSClient$DFSInputStream.(DFSClient.java:302) at org.apache.hadoop.dfs.DFSClient.open(DFSClient.java:95) at org.apache.hadoop.dfs.DistributedFileSystem.openRaw(DistributedFileSystem.java:78) at org.apache.hadoop.fs.FSDataInputStream$Checker.(FSDataInputStream.java:46) at org.apache.hadoop.fs.FSDataInputStream.(FSDataInputStream.java:220) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:146) at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:234) at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:226) at org.apache.hadoop.mapred.SequenceFileRecordReader.(SequenceFileRecordReader.java:36) at org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:53) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:105) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:709)

      Those tasks succeeded in the second or third try.

      After interting 10 seconds sleep between consecutive jobs, the problem disappear.

      Here is my code to detect whether a job is completed:

      try {
      running = jc.submitJob(job);
      String jobId = running.getJobID();
      System.out.println("start job:\t" + jobId);
      while (!running.isComplete()) {
      try

      { Thread.sleep(1000); }

      catch (InterruptedException e) {}
      running = jc.getJob(jobId);
      }
      sucess = running.isSuccessful();
      } finally {
      if (!sucess && (running != null))

      { running.killJob(); }

      jc.close();
      }

      Attachments

        Activity

          People

            omalley Owen O'Malley
            runping Runping Qi
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: