Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-596

can't package zip file with hadoop streaming -file argument

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Invalid
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: contrib/streaming
    • Labels:
      None

      Description

      I'm unable to ship a file with a .zip suffix to the mapper using the -file argument for hadoop streaming. I am able to ship it if I change the suffix to .zipp. Is this a bug, or perhaps has something to do with the jar file format which is used to send files to the instance?

      For example, with this hadoop invocation, and local files "/tmp/boto.zip" and "/tmp/boto.zipp" which are copies of each other:

      $HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/contrib/streaming/hadoop-0.17.0-streaming.jar -mapper $KCLUSTER_SRC/testmapper.py -reducer $KCLUSTER_SRC/testreducer.py -input input/foo -output output -file /tmp/foo.txt -file /tmp/boto.zip -file /tmp/boto.zipp

      I see this line in the invocation standard output:

      packageJobJar: [/tmp/foo.txt, /tmp/boto.zip, /tmp/boto.zipp, /tmp/hadoop-karl/hadoop-unjar6899/] [] /tmp/streamjob6900.jar tmpDir=null

      But in the current directory of the mapper process, "boto.zip" does not exist, while "boto.zipp" does.

        Activity

        Hide
        Trevor Rundell added a comment -

        Apparently this issue is still around. When trying to distribute a .zip file with -file I end up with a job jar structure something like this...

        Archive: job_201004151121_0002.jar
        inflating: load_diff.py
        inflating: getmaps.py
        inflating: lib/warehouse.zip
        inflating: envs.cfg
        ...

        For some reason, the zip file ends up in the lib/ directory. When I change the extension to .zipp the file ends up in the top level like I'd expect it to.

        Archive: job_201004151121_0004.jar
        inflating: load_diff.py
        inflating: getmaps.py
        inflating: warehouse.zipp
        inflating: envs.cfg
        ...

        Any particular reason for this?

        Show
        Trevor Rundell added a comment - Apparently this issue is still around. When trying to distribute a .zip file with -file I end up with a job jar structure something like this... Archive: job_201004151121_0002.jar inflating: load_diff.py inflating: getmaps.py inflating: lib/warehouse.zip inflating: envs.cfg ... For some reason, the zip file ends up in the lib/ directory. When I change the extension to .zipp the file ends up in the top level like I'd expect it to. Archive: job_201004151121_0004.jar inflating: load_diff.py inflating: getmaps.py inflating: warehouse.zipp inflating: envs.cfg ... Any particular reason for this?
        Hide
        Amareshwari Sriramadasu added a comment -

        zip file is packaged under lib directory. Documentation is updated in MAPREDUCE-1697.

        Show
        Amareshwari Sriramadasu added a comment - zip file is packaged under lib directory. Documentation is updated in MAPREDUCE-1697 .

          People

          • Assignee:
            Unassigned
            Reporter:
            Karl Anderson
          • Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development