Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-3584

streaming.jar -file packaging forgets timestamps

Add voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.20.2
    • None
    • None
    • hadoop streaming

    Description

      When invoking "hadoop jar /usr/local/hadoop/contrib/streaming/hadoop-0.20.2-streaming.jar -file <files>",
      hadoop will package the files <files>, but it will forget about their timestamps.
      After the files are unpacked in <tmp_dir>/mapred/local/taskTracker/jobcache/job_$job/jars, all files will have the timestamps of when the files were unpacked.
      The problem is that this way meaningful information is lost.
      For example in my case i ship some files along with my job, and I need to compare the age (mtime) of 2 files and rebuild one of them if it's too old,
      but because of this hadoop behavior, my logic breaks.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            dieter_be Dieter Plaetinck

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - 2h
                2h
                Remaining:
                Remaining Estimate - 2h
                2h
                Logged:
                Time Spent - Not Specified
                Not Specified

                Slack

                  Issue deployment