Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-2740

MultipleOutputs in new API creates needless TaskAttemptContexts

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.23.0
    • Fix Version/s: 0.23.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      MultipleOutputs.write creates a new TaskAttemptContext, which we've seen to take a significant amount of CPU. The TaskAttemptContext constructor creates a JobConf, gets current UGI, etc. I don't see any reason it needs to do this, instead of just creating a single TaskAttemptContext when the InputFormat is created (or lazily but cached as a member)

      1. mr-2740.txt
        2 kB
        Todd Lipcon

        Issue Links

          Activity

          Hide
          Todd Lipcon added a comment -

          Looks like MAPREDUCE-1853 fixed a similar issue, but in trunk, the following API still does this:

          public void write(KEYOUT key, VALUEOUT value, String baseOutputPath)

          Show
          Todd Lipcon added a comment - Looks like MAPREDUCE-1853 fixed a similar issue, but in trunk, the following API still does this: public void write(KEYOUT key, VALUEOUT value, String baseOutputPath)
          Hide
          Todd Lipcon added a comment -

          Simple patch lazily initializes the context for the job's configured output. No tests since it's covered by existing ones.

          Show
          Todd Lipcon added a comment - Simple patch lazily initializes the context for the job's configured output. No tests since it's covered by existing ones.
          Hide
          Harsh J added a comment -

          The changes lgtm.

          Show
          Harsh J added a comment - The changes lgtm.
          Hide
          Amareshwari Sriramadasu added a comment -

          +1 from me too. Can commit once hudson comes back

          Show
          Amareshwari Sriramadasu added a comment - +1 from me too. Can commit once hudson comes back
          Hide
          Todd Lipcon added a comment -

          Ran local test-patch:

          [exec] -1 overall.
          [exec]
          [exec] +1 @author. The patch does not contain any @author tags.
          [exec]
          [exec] -1 tests included. The patch doesn't appear to include any new or modified tests.
          [exec] Please justify why no new tests are needed for this patch.
          [exec] Also please list what manual steps were performed to verify this patch.
          [exec]
          [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
          [exec]
          [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
          [exec]
          [exec] -1 findbugs. The patch appears to introduce -7 new Findbugs (version 1.3.8) warnings.
          [exec]
          [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.
          [exec]
          [exec] +1 system test framework. The patch passed system test framework compile.

          • No tests are included since it's an optimization, and covered by original tests.
          • the findbugs warning is clearly bogus – negative 7?
          Show
          Todd Lipcon added a comment - Ran local test-patch: [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no new tests are needed for this patch. [exec] Also please list what manual steps were performed to verify this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] -1 findbugs. The patch appears to introduce -7 new Findbugs (version 1.3.8) warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] +1 system test framework. The patch passed system test framework compile. No tests are included since it's an optimization, and covered by original tests. the findbugs warning is clearly bogus – negative 7?
          Hide
          Todd Lipcon added a comment -

          I ran TestMultipleOutputs and TestMRMultipleOutputs and they pass. Those are the only tests which reference the changed code. I'll commit this to trunk momentarily

          Show
          Todd Lipcon added a comment - I ran TestMultipleOutputs and TestMRMultipleOutputs and they pass. Those are the only tests which reference the changed code. I'll commit this to trunk momentarily
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk-Commit #760 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/760/)
          MAPREDUCE-2740. MultipleOutputs in new API creates needless TaskAttemptContexts. Contributed by Todd Lipcon.

          todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1152875
          Files :

          • /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.java
          • /hadoop/common/trunk/mapreduce/CHANGES.txt
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #760 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/760/ ) MAPREDUCE-2740 . MultipleOutputs in new API creates needless TaskAttemptContexts. Contributed by Todd Lipcon. todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1152875 Files : /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.java /hadoop/common/trunk/mapreduce/CHANGES.txt
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Mapreduce-trunk #751 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/751/)
          MAPREDUCE-2740. MultipleOutputs in new API creates needless TaskAttemptContexts. Contributed by Todd Lipcon.

          todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1152875
          Files :

          • /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.java
          • /hadoop/common/trunk/mapreduce/CHANGES.txt
          Show
          Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #751 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/751/ ) MAPREDUCE-2740 . MultipleOutputs in new API creates needless TaskAttemptContexts. Contributed by Todd Lipcon. todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1152875 Files : /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.java /hadoop/common/trunk/mapreduce/CHANGES.txt

            People

            • Assignee:
              Todd Lipcon
              Reporter:
              Todd Lipcon
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development