Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1785

Add streaming config option for not emitting the key

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.22.0
    • Fix Version/s: 0.22.0
    • Component/s: contrib/streaming
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      Added a configuration property "stream.map.input.ignoreKey" to specify whether to ignore key or not while writing input for the mapper. This configuration parameter is valid only if stream.map.input.writer.class is org.apache.hadoop.streaming.io.TextInputWriter.class. For all other InputWriter's, key is always written.
      Show
      Added a configuration property "stream.map.input.ignoreKey" to specify whether to ignore key or not while writing input for the mapper. This configuration parameter is valid only if stream.map.input.writer.class is org.apache.hadoop.streaming.io.TextInputWriter.class. For all other InputWriter's, key is always written.

      Description

      PipeMapper currently does not emit the key when using TextInputFormat. If you switch to input formats (eg LzoTextInputFormat) the key will be emitted. We should add an option so users can explicitly make streaming not emit the key so they can change input formats without breaking or having to modify their existing programs.

        Activity

        Amareshwari Sriramadasu made changes -
        Release Note Added a configuration property "stream.map.input.ignoreKey" to specify whether to ignore key or not while reading input. Added a configuration property "stream.map.input.ignoreKey" to specify whether to ignore key or not while writing input for the mapper. This configuration parameter is valid only if stream.map.input.writer.class is org.apache.hadoop.streaming.io.TextInputWriter.class. For all other InputWriter's, key is always written.
        Amareshwari Sriramadasu made changes -
        Release Note Added a configuration property "stream.map.input.ignoreKey" to specify whether to ignore key or not while reading input.
        Sharad Agarwal made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Hadoop Flags [Reviewed]
        Resolution Fixed [ 1 ]
        Hide
        Sharad Agarwal added a comment -

        I committed this. Thanks Eli.

        Show
        Sharad Agarwal added a comment - I committed this. Thanks Eli.
        Hide
        Amareshwari Sriramadasu added a comment -

        Patch looks good to me also.

        -1 core tests.

        TestSpecialCharactersInOutputPath failed with NoClassDefFoundError. I ran the same test on my machine with the patch. It ran successfully.

        Show
        Amareshwari Sriramadasu added a comment - Patch looks good to me also. -1 core tests. TestSpecialCharactersInOutputPath failed with NoClassDefFoundError. I ran the same test on my machine with the patch. It ran successfully.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12444445/mapreduce-1785-1.patch
        against trunk revision 944082.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/188/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/188/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/188/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/188/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12444445/mapreduce-1785-1.patch against trunk revision 944082. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/188/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/188/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/188/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/188/console This message is automatically generated.
        Eli Collins made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Hide
        Tom White added a comment -

        +1

        Show
        Tom White added a comment - +1
        Eli Collins made changes -
        Field Original Value New Value
        Attachment mapreduce-1785-1.patch [ 12444445 ]
        Hide
        Eli Collins added a comment -

        Patch attached.

        • Adds stream.map.input.ignoreKey for toggling key emission. The default behavior is unchanged.
        • Updated streaming.xml docs and added test coverage in TestStreamingKeyValue
        Show
        Eli Collins added a comment - Patch attached. Adds stream.map.input.ignoreKey for toggling key emission. The default behavior is unchanged. Updated streaming.xml docs and added test coverage in TestStreamingKeyValue
        Eli Collins created issue -

          People

          • Assignee:
            Eli Collins
            Reporter:
            Eli Collins
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development