Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-3638

Yarn trying to download cacheFile to container but Path is a local file

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Invalid
    • Affects Version/s: 0.23.0
    • Fix Version/s: None
    • Component/s: mrv2
    • Labels:
      None

      Description

      It looks like the AM, which is running on
      host1.com, is trying to access a local file but the file is on host2.com
      (where the command was run).

      ran:
      hadoop --config conf/hadoop/
      jar hadoop-streaming.jar -Dmapreduce.job.acl-view-job=*
      -input Streaming/streaming-610/input.txt -mapper 'xargs cat' -reducer cat -output
      Streaming/streaming-610/Output -cacheFile
      file://Streaming/data/streaming-610//InputFile#testlink
      -jobconf mapred.map.tasks=1 -jobconf mapred.reduce.tasks=1 -jobconf
      mapred.job.name=streamingTest-610 -jobconf mapreduce.job.acl-view-job=*

      failure:

      11/11/10 07:48:06 INFO mapreduce.Job: Job job_1320887371559_0215 failed with state FAILED due to: Application
      application_1320887371559_0215 failed 1 times due to AM Container for appattempt_1320887371559_0215_000001 exited with
      exitCode: -1000 due to: java.io.FileNotFoundException: File
      file:/Streaming/data/streaming-610/InputFile
      does not exist
      at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:431)
      at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:315)
      at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:85)
      at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:152)
      at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:50)
      at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
      at java.util.concurrent.FutureTask.run(FutureTask.java:138)
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
      at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
      at java.util.concurrent.FutureTask.run(FutureTask.java:138)
      at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
      at java.lang.Thread.run(Thread.java:619)

        Activity

        Hide
        Arun C Murthy added a comment -

        This looks like a very long-standing bug, this code hasn't changed since 2009...

        Show
        Arun C Murthy added a comment - This looks like a very long-standing bug, this code hasn't changed since 2009...
        Hide
        Philip Su added a comment -

        I did some more follow up testing on this and I think I know more precisely where the problem is.

        1) The failure occurs when running a streaming job with the -cacheFile option on a local file system using file:///<path>.
        2) I ran hdfs dfs -ls file:///<path> to make sure the file exists.
        3) I ran the same streaming job using the same value from 1). But instead of using the deprecated -cacheFile option, I used -files instead. The job ran and passed.

        So is seems when running the streaming job using the deprecated option -cacheFile on a local file system, it is not getting the correct file permission on it.

        Show
        Philip Su added a comment - I did some more follow up testing on this and I think I know more precisely where the problem is. 1) The failure occurs when running a streaming job with the -cacheFile option on a local file system using file:/// <path>. 2) I ran hdfs dfs -ls file:/// <path> to make sure the file exists. 3) I ran the same streaming job using the same value from 1). But instead of using the deprecated -cacheFile option, I used -files instead. The job ran and passed. So is seems when running the streaming job using the deprecated option -cacheFile on a local file system, it is not getting the correct file permission on it.
        Hide
        Mahadev konar added a comment -

        Thanks Philip! Thats helpful. Given that we have a workarnd to use -files (and also -files is the more prominent usage in streaming), I think this might not be that urgent to fix.

        Show
        Mahadev konar added a comment - Thanks Philip! Thats helpful. Given that we have a workarnd to use -files (and also -files is the more prominent usage in streaming), I think this might not be that urgent to fix.
        Hide
        Philip Su added a comment -

        It's not urgent. We do have 4 regression tests blocked by this, so it would be good to have this fixed at some point in the near future. Thanks!

        Show
        Philip Su added a comment - It's not urgent. We do have 4 regression tests blocked by this, so it would be good to have this fixed at some point in the near future. Thanks!
        Hide
        Ramya Sunil added a comment -

        cacheFile for local FS was never supported. cacheFile downloads files from HDFS only. This is a deprecated option and files option has to be used for downloading files from local FS. This is not an issue.

        Show
        Ramya Sunil added a comment - cacheFile for local FS was never supported. cacheFile downloads files from HDFS only. This is a deprecated option and files option has to be used for downloading files from local FS. This is not an issue.
        Hide
        Arun C Murthy added a comment -

        Thanks Ramya. Resolving this.

        Show
        Arun C Murthy added a comment - Thanks Ramya. Resolving this.

          People

          • Assignee:
            Unassigned
            Reporter:
            Thomas Graves
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development