Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-2391

Speculative Execution race condition with output paths

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.16.1
    • None
    • None
    • all

    Description

      I am tracking a problem where when speculative execution is enabled, there is a race condition when trying to read output paths from a previously completed job. More specifically when reduce tasks run their output is put into a working directory under the task name until the task in completed. The directory name is something like workdir/_taskid. Upon completion the output get moved into workdir. Regular tasks are checked for this move and not considered completed until this move is made. I have not verified it but all indications point to speculative tasks NOT having this same check for completion and more importantly removal when killed. So what we end up with when trying to read the output of previous tasks with speculative execution enabled is the possibility that previous workdir/_taskid will be present when the output directory is read by a chained job. Here is an error when supports my theory:

      Generator: org.apache.hadoop.ipc.RemoteException: java.io.IOException: Cannot open filename /u01/hadoop/mapred/temp/generate-temp-1197104928603/_task_200712080949_0005_r_000014_1
      at org.apache.hadoop.dfs.NameNode.open(NameNode.java:234)
      at sun.reflect.GeneratedMethodAccessor64.invoke(Unknown Source)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:597)
      at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:389)
      at org.apache.hadoop.ipc.Server$Handler.run(Server.java:644)
      at org.apache.hadoop.ipc.Client.call(Client.java:507)
      at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:186)
      at org.apache.hadoop.dfs.$Proxy0.open(Unknown Source)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:597)
      at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
      at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
      at org.apache.hadoop.dfs.$Proxy0.open(Unknown Source)
      at org.apache.hadoop.dfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:839)
      at org.apache.hadoop.dfs.DFSClient$DFSInputStream.<init>(DFSClient.java:831)
      at org.apache.hadoop.dfs.DFSClient.open(DFSClient.java:263)
      at org.apache.hadoop.dfs.DistributedFileSystem.open(DistributedFileSystem.java:114)
      at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1356)
      at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1349)
      at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1344)
      at org.apache.hadoop.mapred.SequenceFileOutputFormat.getReaders(SequenceFileOutputFormat.java:87)
      at org.apache.nutch.crawl.Generator.generate(Generator.java:429)
      at org.apache.nutch.crawl.Generator.run(Generator.java:563)
      at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
      at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:54)
      at org.apache.nutch.crawl.Generator.main(Generator.java:526)

      I will continue to research this and post as I make progress on tracking down this bug.

      Attachments

        1. HADOOP-2391-1-20071211.patch
          2 kB
          Dennis Kubes
        2. patch-2391.txt
          16 kB
          Amareshwari Sriramadasu

        Issue Links

          Activity

            People

              amareshwari Amareshwari Sriramadasu
              musepwizard Dennis Kubes
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: