Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1697

Document the behavior of -file option in streaming and deprecate it in favour of generic -files option.



    • Reviewed
    • Documented the behavior of -file option in streaming and deprecated it in favor of generic -files option.


      The behavior of -file option in streaming is not documented anywhere.
      The behavior of -file is the following :
      1) All the files passed through -file option are packaged into job.jar.
      2) If -file option is used for .class or .jar files, they are unjarred on tasktracker and placed in ${mapred.local.dir}/taskTracker/jobcache/job_ID/jars/classes or /lib, respectively. Symlinks to the directories classes and lib are created from the cwd of the task, . The names of symlinks are "classes", "lib". So file names of .class or .jar files do not appear in cwd of the task.
      Paths to these files are automatically added to classpath. The tricky part is that hadoop framework can pick .class or .jar using classpath, but actual mapper script cannot. If you'd like to access these .class or .jar inside script, please do something like "java -cp lib/;classes/ <ClassName>".
      3) If -file option is used for files other than .class or .jar (e.g, .txt or .pl), these files are unjarred into ${mapred.local.dir}/taskTracker/jobcache/job_ID/jars/. Symlinks to these files are created from the cwd of the task. Names of these symlinks are actually file names.


        1. patch-1697-3.txt
          4 kB
          Amareshwari Sriramadasu
        2. patch-1697-2.txt
          4 kB
          Amareshwari Sriramadasu
        3. patch-1697-1.txt
          1 kB
          Amareshwari Sriramadasu
        4. patch-1697.txt
          1 kB
          Amareshwari Sriramadasu

        Issue Links



              amareshwari Amareshwari Sriramadasu
              amareshwari Amareshwari Sriramadasu
              0 Vote for this issue
              2 Start watching this issue