Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1145

Multiple Outputs doesn't work with new API in 0.20 branch

    Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: 0.20.1, 0.20.2
    • Fix Version/s: 0.20.2
    • Component/s: None
    • Labels:
      None

      Description

      I know this is working in the 0.21 branch but it's dependent on a ton of other refactorings and near-impossible to backport. I hacked together a quick forwards-port in o.a.h.mapreduce.lib.output.MultipleOutputs. Unit test attached, requires a one-liner change to FileOutputFormat.

      Maybe 0.20.2?

        Activity

        Hide
        Hiral Patel added a comment -

        There is a bug where multiple outputs with different output key and value classes are not working. All outputs have the same output key and value class. Added patch to MultipleOutputs.java to fix this.

        Here is the diff from Jay's patch:

        313,314d312
        < +import org.apache.hadoop.io.LongWritable;
        < +import org.apache.hadoop.io.Text;
        734c732
        < + outputFormat.getRecordWriter(new MOTaskAttemptContextWrapper(namedOutput,ctx));

        > + outputFormat.getRecordWriter(ctx);
        876,906d873
        < +
        < + private class MOTaskAttemptContextWrapper extends TaskAttemptContext {
        < +
        < + private final Class<?+outputKeyClass;
        < + private final Class<?+outputValueClass;
        < +
        < + public MOTaskAttemptContextWrapper(final String namedOutput,
        < + TaskAttemptContext ctx)

        { < + super(ctx.getConfiguration(), ctx.getTaskAttemptID()); < + outputKeyClass=conf.getClass(MO_PREFIX + namedOutput + KEY, LongWritable.class); < + outputValueClass=conf.getClass(MO_PREFIX + namedOutput + VALUE, Text.class); < + }

        < +
        < + /**
        < + * Get the key class for the job output data.
        < + * @return the key class for the job output data.
        < + */
        < + @Override
        < + public Class<?+getOutputKeyClass()

        { < + return outputKeyClass; < + }

        < +
        < + /**
        < + * Get the value class for job outputs.
        < + * @return the value class for job outputs.
        < + */
        < + @Override
        < + public Class<?+getOutputValueClass()

        { < + return outputValueClass; < + }

        < + }

        Show
        Hiral Patel added a comment - There is a bug where multiple outputs with different output key and value classes are not working. All outputs have the same output key and value class. Added patch to MultipleOutputs.java to fix this. Here is the diff from Jay's patch: 313,314d312 < +import org.apache.hadoop.io.LongWritable; < +import org.apache.hadoop.io.Text; 734c732 < + outputFormat.getRecordWriter(new MOTaskAttemptContextWrapper(namedOutput,ctx)); — > + outputFormat.getRecordWriter(ctx); 876,906d873 < + < + private class MOTaskAttemptContextWrapper extends TaskAttemptContext { < + < + private final Class<?+outputKeyClass; < + private final Class<?+outputValueClass; < + < + public MOTaskAttemptContextWrapper(final String namedOutput, < + TaskAttemptContext ctx) { < + super(ctx.getConfiguration(), ctx.getTaskAttemptID()); < + outputKeyClass=conf.getClass(MO_PREFIX + namedOutput + KEY, LongWritable.class); < + outputValueClass=conf.getClass(MO_PREFIX + namedOutput + VALUE, Text.class); < + } < + < + /** < + * Get the key class for the job output data. < + * @return the key class for the job output data. < + */ < + @Override < + public Class<?+getOutputKeyClass() { < + return outputKeyClass; < + } < + < + /** < + * Get the value class for job outputs. < + * @return the value class for job outputs. < + */ < + @Override < + public Class<?+getOutputValueClass() { < + return outputValueClass; < + } < + }
        Hide
        Jay Booth added a comment -

        It was brought to my attention that the patch I posted before had the wrong unit test attached - found a few spare minutes and cleaned it up, in case any passersby want this capability. Not reopening issue for 0.20 branch.

        Show
        Jay Booth added a comment - It was brought to my attention that the patch I posted before had the wrong unit test attached - found a few spare minutes and cleaned it up, in case any passersby want this capability. Not reopening issue for 0.20 branch.
        Hide
        Jay Booth added a comment -

        Understood, I guess we can just leave this up if people want to take the responsibility for applying it themselves

        Show
        Jay Booth added a comment - Understood, I guess we can just leave this up if people want to take the responsibility for applying it themselves
        Hide
        Chris Douglas added a comment -

        Old patches were against our internal snapshot, which had different revision #s and borked the patch

        This one's against hadoop/common/branches/0.20, should work, right?

        Hudson applies the most recently attached file as a patch against trunk. It takes neither the fix version on JIRA nor any artifact in the patch (such as revision) into account when applying it and running tests.

        With few exceptions, we don't backport features into old branches. Pushing new code into stable releases forces a cascade of new bug reports and unanticipated interactions that we cannot sustain in concert with development on trunk.

        Show
        Chris Douglas added a comment - Old patches were against our internal snapshot, which had different revision #s and borked the patch This one's against hadoop/common/branches/0.20, should work, right? Hudson applies the most recently attached file as a patch against trunk. It takes neither the fix version on JIRA nor any artifact in the patch (such as revision) into account when applying it and running tests. With few exceptions, we don't backport features into old branches. Pushing new code into stable releases forces a cascade of new bug reports and unanticipated interactions that we cannot sustain in concert with development on trunk.
        Hide
        Jay Booth added a comment -

        Old patches were against our internal snapshot, which had different revision #s and borked the patch

        This one's against hadoop/common/branches/0.20, should work, right?

        Show
        Jay Booth added a comment - Old patches were against our internal snapshot, which had different revision #s and borked the patch This one's against hadoop/common/branches/0.20, should work, right?
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12423031/multiple-outputs.patch
        against trunk revision 831037.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        -1 patch. The patch command could not apply the patch.

        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/113/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12423031/multiple-outputs.patch against trunk revision 831037. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/113/console This message is automatically generated.

          People

          • Assignee:
            Unassigned
            Reporter:
            Jay Booth
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development