Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1403

Save file-sizes of each of the artifacts in DistributedCache in the JobConf

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.22.0
    • Fix Version/s: 0.21.0
    • Component/s: client
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Added private configuration variables: mapred.cache.files.filesizes and mapred.cache.archives.filesizes to store sizes of distributed cache artifacts per job. This can be used by tools like Gridmix in simulation runs.

      Description

      It would be a useful metric to collect... potentially GridMix could use it to emulate jobs which use the DistributedCache.

      1. MAPREDUCE-1403_yhadoop20.patch
        7 kB
        Arun C Murthy
      2. MAPREDUCE-1403_yhadoop20-1.patch
        7 kB
        Hemanth Yamijala
      3. MAPREDUCE-1403_yhadoop20-2.patch
        7 kB
        Hemanth Yamijala
      4. MR-1403-trunk-1.patch
        11 kB
        Luke Lu
      5. mr-1403-trunk-v2.patch
        11 kB
        Luke Lu
      6. mr-1403-trunk-v3.patch
        10 kB
        Luke Lu
      7. mr-1403-trunk-v4.patch
        10 kB
        Luke Lu

        Activity

        Arun C Murthy created issue -
        Hong Tang made changes -
        Field Original Value New Value
        Assignee Arun C Murthy [ acmurthy ] Hong Tang [ hong.tang ]
        Hide
        Arun C Murthy added a comment -

        I propose we save it in the job-conf at the client side as we are setting up the distributed-cache for the job with a key like mapred.cache.files.sizes (akin to mapred.cache.files.timestamps etc.).

        Show
        Arun C Murthy added a comment - I propose we save it in the job-conf at the client side as we are setting up the distributed-cache for the job with a key like mapred.cache.files.sizes (akin to mapred.cache.files.timestamps etc.).
        Arun C Murthy made changes -
        Assignee Hong Tang [ hong.tang ] Arun C Murthy [ acmurthy ]
        Hide
        Arun C Murthy added a comment -

        Patch for y20 distribution. Not to be committed.

        Show
        Arun C Murthy added a comment - Patch for y20 distribution. Not to be committed.
        Arun C Murthy made changes -
        Attachment MAPREDUCE-1403_yhadoop20.patch [ 12435330 ]
        Hide
        Hemanth Yamijala added a comment -

        Arun, patch looks fine. There were a few minor nits that I have fixed in the attached patch:

        • In DistributedCache.java, I refactored getTimestamp to reuse getFileStatus, as the entire code was duplicated.
        • In JobClient.java, there was an extraneous String.valueOf when constructing the modification time buffer. Something like:
          +        new StringBuffer(String.valueOf(
          +            String.valueOf(status.getModificationTime())));
          

          Fixed that to remove the extraneous call.

        • In MRCaching, moved a System.err.println to System.out.println, as it the follows the rest of the output and the debug statements now come along with rest of the output - so I thought it would be easier to debug if required.

        Please verify the changes once.

        Show
        Hemanth Yamijala added a comment - Arun, patch looks fine. There were a few minor nits that I have fixed in the attached patch: In DistributedCache.java, I refactored getTimestamp to reuse getFileStatus, as the entire code was duplicated. In JobClient.java, there was an extraneous String.valueOf when constructing the modification time buffer. Something like: + new StringBuffer ( String .valueOf( + String .valueOf(status.getModificationTime()))); Fixed that to remove the extraneous call. In MRCaching, moved a System.err.println to System.out.println, as it the follows the rest of the output and the debug statements now come along with rest of the output - so I thought it would be easier to debug if required. Please verify the changes once.
        Hemanth Yamijala made changes -
        Attachment MAPREDUCE-1403_yhadoop20-1.patch [ 12435768 ]
        Hide
        Arun C Murthy added a comment -

        +1, thanks for the review Hemanth.

        Show
        Arun C Murthy added a comment - +1, thanks for the review Hemanth.
        Hemanth Yamijala made changes -
        Release Note Added private configuration variables: mapred.cache.files.filesizes and mapred.cache.archives.filesizes to store sizes of distributed cache artifacts per job. This can be used by tools like Gridmix in simulation runs.
        Hide
        Hemanth Yamijala added a comment -

        More up-to-date patch for older version of hadoop. Not for commit here.

        Show
        Hemanth Yamijala added a comment - More up-to-date patch for older version of hadoop. Not for commit here.
        Hemanth Yamijala made changes -
        Attachment MAPREDUCE-1403_yhadoop20-2.patch [ 12436842 ]
        Hide
        Luke Lu added a comment -

        Ported to trunk.

        Show
        Luke Lu added a comment - Ported to trunk.
        Luke Lu made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Affects Version/s 0.22.0 [ 12314184 ]
        Luke Lu made changes -
        Attachment MR-1403-trunk-1.patch [ 12436997 ]
        Hide
        Chris Douglas added a comment -
        • The patch includes a whitespace change to Job
        • Can you explain the addition of getConfiguration to the RunningJob interface? Is the relevant copy in JobContextImpl?
        • Please retain javadoc for CACHE_FILES_SIZES and CACHE_ARCHIVES_SIZES, rather than code comments
        • javadoc referring to classes (as TrackerDistributedCacheManager::getFileStatus) should probably use {{ {@link org.class.name}

          }} instead of <code>name</code>

        • The actual/expected in the error message for assertEquals is redundant
        Show
        Chris Douglas added a comment - The patch includes a whitespace change to Job Can you explain the addition of getConfiguration to the RunningJob interface? Is the relevant copy in JobContextImpl ? Please retain javadoc for CACHE_FILES_SIZES and CACHE_ARCHIVES_SIZES , rather than code comments javadoc referring to classes (as TrackerDistributedCacheManager::getFileStatus ) should probably use {{ {@link org.class.name} }} instead of <code>name</code> The actual/expected in the error message for assertEquals is redundant
        Chris Douglas made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Hide
        Luke Lu added a comment -

        Discussed with Chris on item 2 and 3, Incorporated the rest of the suggestions.

        Show
        Luke Lu added a comment - Discussed with Chris on item 2 and 3, Incorporated the rest of the suggestions.
        Luke Lu made changes -
        Attachment mr-1403-trunk-v2.patch [ 12438333 ]
        Luke Lu made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12438333/mr-1403-trunk-v2.patch
        against trunk revision 921069.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 4 new or modified tests.

        -1 patch. The patch command could not apply the patch.

        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/513/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12438333/mr-1403-trunk-v2.patch against trunk revision 921069. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/513/console This message is automatically generated.
        Hide
        Luke Lu added a comment -

        Rebased the patch against trunk.

        Show
        Luke Lu added a comment - Rebased the patch against trunk.
        Luke Lu made changes -
        Attachment mr-1403-trunk-v3.patch [ 12438342 ]
        Chris Douglas made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Chris Douglas made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12438342/mr-1403-trunk-v3.patch
        against trunk revision 921069.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 4 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/28/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/28/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/28/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/28/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12438342/mr-1403-trunk-v3.patch against trunk revision 921069. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/28/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/28/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/28/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/28/console This message is automatically generated.
        Hide
        Vinod Kumar Vavilapalli added a comment -

        Core tests got a -1 for failure of TestJobRetire, I've checked the console output to realize that HADOOP-6528 is hit again.

        Show
        Vinod Kumar Vavilapalli added a comment - Core tests got a -1 for failure of TestJobRetire, I've checked the console output to realize that HADOOP-6528 is hit again.
        Hide
        Luke Lu added a comment -

        Added javadoc for the getConfiguration method in the RunningJob interface.

        Show
        Luke Lu added a comment - Added javadoc for the getConfiguration method in the RunningJob interface.
        Luke Lu made changes -
        Attachment mr-1403-trunk-v4.patch [ 12438471 ]
        Hide
        Chris Douglas added a comment -

        TestJobRetire passes on my machine.

        +1

        I committed this. Thanks, Arun and Luke!

        Show
        Chris Douglas added a comment - TestJobRetire passes on my machine. +1 I committed this. Thanks, Arun and Luke!
        Chris Douglas made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Hadoop Flags [Reviewed]
        Resolution Fixed [ 1 ]
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk-Commit #273 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/273/)
        . Save the size and number of distributed cache artifacts in the
        configuration. Contributed by Arun Murthy

        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #273 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/273/ ) . Save the size and number of distributed cache artifacts in the configuration. Contributed by Arun Murthy
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #255 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/255/)
        . Save the size and number of distributed cache artifacts in the
        configuration. Contributed by Arun Murthy

        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #255 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/255/ ) . Save the size and number of distributed cache artifacts in the configuration. Contributed by Arun Murthy
        Tom White made changes -
        Fix Version/s 0.21.0 [ 12314045 ]
        Fix Version/s 0.22.0 [ 12314184 ]
        Tom White made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Patch Available Patch Available Open Open
        7d 22h 3m 2 Chris Douglas 09/Mar/10 23:48
        Open Open Patch Available Patch Available
        38d 55m 3 Chris Douglas 09/Mar/10 23:48
        Patch Available Patch Available Resolved Resolved
        1d 10h 39m 1 Chris Douglas 11/Mar/10 10:28
        Resolved Resolved Closed Closed
        166d 10h 52m 1 Tom White 24/Aug/10 22:20

          People

          • Assignee:
            Arun C Murthy
            Reporter:
            Arun C Murthy
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development