Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-596

can't package zip file with hadoop streaming -file argument

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Invalid
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: contrib/streaming
    • Labels:
      None

      Description

      I'm unable to ship a file with a .zip suffix to the mapper using the -file argument for hadoop streaming. I am able to ship it if I change the suffix to .zipp. Is this a bug, or perhaps has something to do with the jar file format which is used to send files to the instance?

      For example, with this hadoop invocation, and local files "/tmp/boto.zip" and "/tmp/boto.zipp" which are copies of each other:

      $HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/contrib/streaming/hadoop-0.17.0-streaming.jar -mapper $KCLUSTER_SRC/testmapper.py -reducer $KCLUSTER_SRC/testreducer.py -input input/foo -output output -file /tmp/foo.txt -file /tmp/boto.zip -file /tmp/boto.zipp

      I see this line in the invocation standard output:

      packageJobJar: [/tmp/foo.txt, /tmp/boto.zip, /tmp/boto.zipp, /tmp/hadoop-karl/hadoop-unjar6899/] [] /tmp/streamjob6900.jar tmpDir=null

      But in the current directory of the mapper process, "boto.zip" does not exist, while "boto.zipp" does.

        Activity

        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Resolved Resolved
        685d 13h 35m 1 Amareshwari Sriramadasu 08/Jun/10 12:51
        Amareshwari Sriramadasu made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Invalid [ 6 ]
        Hide
        Amareshwari Sriramadasu added a comment -

        zip file is packaged under lib directory. Documentation is updated in MAPREDUCE-1697.

        Show
        Amareshwari Sriramadasu added a comment - zip file is packaged under lib directory. Documentation is updated in MAPREDUCE-1697 .
        Hide
        Trevor Rundell added a comment -

        Apparently this issue is still around. When trying to distribute a .zip file with -file I end up with a job jar structure something like this...

        Archive: job_201004151121_0002.jar
        inflating: load_diff.py
        inflating: getmaps.py
        inflating: lib/warehouse.zip
        inflating: envs.cfg
        ...

        For some reason, the zip file ends up in the lib/ directory. When I change the extension to .zipp the file ends up in the top level like I'd expect it to.

        Archive: job_201004151121_0004.jar
        inflating: load_diff.py
        inflating: getmaps.py
        inflating: warehouse.zipp
        inflating: envs.cfg
        ...

        Any particular reason for this?

        Show
        Trevor Rundell added a comment - Apparently this issue is still around. When trying to distribute a .zip file with -file I end up with a job jar structure something like this... Archive: job_201004151121_0002.jar inflating: load_diff.py inflating: getmaps.py inflating: lib/warehouse.zip inflating: envs.cfg ... For some reason, the zip file ends up in the lib/ directory. When I change the extension to .zipp the file ends up in the top level like I'd expect it to. Archive: job_201004151121_0004.jar inflating: load_diff.py inflating: getmaps.py inflating: warehouse.zipp inflating: envs.cfg ... Any particular reason for this?
        Owen O'Malley made changes -
        Field Original Value New Value
        Project Hadoop Common [ 12310240 ] Hadoop Map/Reduce [ 12310941 ]
        Key HADOOP-3811 MAPREDUCE-596
        Affects Version/s 0.17.0 [ 12312913 ]
        Component/s contrib/streaming [ 12312905 ]
        Component/s contrib/streaming [ 12310972 ]
        Karl Anderson created issue -

          People

          • Assignee:
            Unassigned
            Reporter:
            Karl Anderson
          • Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development