Hadoop Common
  1. Hadoop Common
  2. HADOOP-2302

Streaming should provide an option for numerical sort of keys

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.19.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Introduced numerical key comparison for streaming.

      Description

      It would be good to have an option for numerical sort of keys for streaming.

      1. 2302.1.patch
        41 kB
        Devaraj Das
      2. 2302.3.patch
        43 kB
        Devaraj Das
      3. 2302-new.patch
        52 kB
        Devaraj Das
      4. 2302-new.patch
        51 kB
        Devaraj Das
      5. 2302-new1.patch
        52 kB
        Devaraj Das
      6. 2302-new2.patch
        52 kB
        Devaraj Das

        Activity

        Hide
        arkady borkovsky added a comment -

        This issue should probably be generalized to
        "Streaming should provide more powerful specification primary keys comparator"
        (meaning the comparator used for splits).

        It should allow at least
        – numeric comparison
        – reverse order comparison
        – multiple field comparison

        One possible way to specify the comparison in the streaming command line is to use the familiar syntax of Unix sort command, like
        "-k2,2rn -k1,1"
        for "compare the second field, numerically, large value first; if equal, compare the first field, alphabetically"

        Note that this specification implicitly defines the part of the string that is the key for shuffling purposes

        Show
        arkady borkovsky added a comment - This issue should probably be generalized to "Streaming should provide more powerful specification primary keys comparator" (meaning the comparator used for splits). It should allow at least – numeric comparison – reverse order comparison – multiple field comparison One possible way to specify the comparison in the streaming command line is to use the familiar syntax of Unix sort command, like "-k2,2rn -k1,1" for "compare the second field, numerically, large value first; if equal, compare the first field, alphabetically" Note that this specification implicitly defines the part of the string that is the key for shuffling purposes
        Hide
        Devaraj Das added a comment -

        Would supporting the basic unix/gnu sort options in the comparator work:
        -f, (Ignore-case)
        -n, (Sort numerically)
        -r, (Reverse the result of comparison)
        -k pos1[,pos2], where pos is of the form f[.c][opts], where f is the number of the field to use, and c is the number of the first character from the beginning of the field. Fields and character positions are numbered starting with 1; a character position of zero in pos2 indicates the field's last character. If '.c' is omitted from pos1, it defaults to 1 (the beginning of the field); if omitted from pos2, it defaults to 0 (the end of the field). opts are ordering options (any of fnr as described above).

        We assume that the fields in the key are separated by map.output.key.field.separator (already exists).

        Do we need anything else?

        Also, this could be done in a Java comparator implementation that the user specifies to the framework via mapred.output.key.comparator.class. This comparator would be used by both sort and merge.

        Show
        Devaraj Das added a comment - Would supporting the basic unix/gnu sort options in the comparator work: -f, (Ignore-case) -n, (Sort numerically) -r, (Reverse the result of comparison) -k pos1 [,pos2] , where pos is of the form f [.c] [opts] , where f is the number of the field to use, and c is the number of the first character from the beginning of the field. Fields and character positions are numbered starting with 1; a character position of zero in pos2 indicates the field's last character. If '.c' is omitted from pos1, it defaults to 1 (the beginning of the field); if omitted from pos2, it defaults to 0 (the end of the field). opts are ordering options (any of fnr as described above). We assume that the fields in the key are separated by map.output.key.field.separator (already exists). Do we need anything else? Also, this could be done in a Java comparator implementation that the user specifies to the framework via mapred.output.key.comparator.class. This comparator would be used by both sort and merge.
        Hide
        Arkady Borkovsky added a comment -

        I think this describes the needs very well.

        Show
        Arkady Borkovsky added a comment - I think this describes the needs very well.
        Hide
        Devaraj Das added a comment -

        A reasonably well tested patch. The following is done:
        1) Options supported are -n (numeric comparison) and -r (reverse the result of comparison). So for e.g., one could say "-k1.1,1.2 -k2.1,2.3n -k2.4,3.1nr" as the value of mapred.text.key.comparator.job option (that the comparator understands).
        2) Some refactoring is done - I needed access to the findBytes method defined in Streaming.UTF8ByteArrayUtils. But since this comparator implementation need not be dependent on the Streaming package, I made a new class org.apache.hadoop.util.UTF8ByteArrayUtils and filled that up with the "real" bytearray util methods. A few Streaming specific methods like findTab also exists in the Streaming.UTF8ByteArrayUtils. I moved them to a new class called Streaming.StreamKeyValUtil. All in all, i introduced two new classes and deprecated Streaming.UTF8ByteArrayUtils.
        3) There is a partitioner function defined that would take a hash of just the portions of the keys that the user is interested in (using the same spec as the one defined for the comparator).

        A note - the numCompare method in the comparator may be slightly verbose in terms of the code but that should help readability of the code.

        Show
        Devaraj Das added a comment - A reasonably well tested patch. The following is done: 1) Options supported are -n (numeric comparison) and -r (reverse the result of comparison). So for e.g., one could say "-k1.1,1.2 -k2.1,2.3n -k2.4,3.1nr" as the value of mapred.text.key.comparator.job option (that the comparator understands). 2) Some refactoring is done - I needed access to the findBytes method defined in Streaming.UTF8ByteArrayUtils. But since this comparator implementation need not be dependent on the Streaming package, I made a new class org.apache.hadoop.util.UTF8ByteArrayUtils and filled that up with the "real" bytearray util methods. A few Streaming specific methods like findTab also exists in the Streaming.UTF8ByteArrayUtils. I moved them to a new class called Streaming.StreamKeyValUtil. All in all, i introduced two new classes and deprecated Streaming.UTF8ByteArrayUtils. 3) There is a partitioner function defined that would take a hash of just the portions of the keys that the user is interested in (using the same spec as the one defined for the comparator). A note - the numCompare method in the comparator may be slightly verbose in terms of the code but that should help readability of the code.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12386652/2302.1.patch
        against trunk revision 678845.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 4 new or modified tests.

        -1 javadoc. The javadoc tool appears to have generated 1 warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        -1 findbugs. The patch appears to introduce 4 new Findbugs warnings.

        -1 release audit. The applied patch generated 211 release audit warnings (more than the trunk's current 210 warnings).

        -1 core tests. The patch failed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2923/testReport/
        Release audit warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2923/artifact/trunk/current/releaseAuditDiffWarnings.txt
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2923/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2923/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2923/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12386652/2302.1.patch against trunk revision 678845. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. -1 javadoc. The javadoc tool appears to have generated 1 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 4 new Findbugs warnings. -1 release audit. The applied patch generated 211 release audit warnings (more than the trunk's current 210 warnings). -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2923/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2923/artifact/trunk/current/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2923/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2923/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2923/console This message is automatically generated.
        Hide
        Devaraj Das added a comment -

        This patch fixes the hudson warnings

        Show
        Devaraj Das added a comment - This patch fixes the hudson warnings
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12386822/2302.3.patch
        against trunk revision 679601.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 4 new or modified tests.

        -1 javadoc. The javadoc tool appears to have generated 1 warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2947/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2947/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2947/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2947/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12386822/2302.3.patch against trunk revision 679601. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. -1 javadoc. The javadoc tool appears to have generated 1 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2947/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2947/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2947/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2947/console This message is automatically generated.
        Hide
        Arun C Murthy added a comment -

        Overall it looks fine, except I'd separate out the partitioner and the comparators i.e. split up UnixComparatorPartitioner. Also, it might be good to add a Hadoop Streaming test case too.

        Show
        Arun C Murthy added a comment - Overall it looks fine, except I'd separate out the partitioner and the comparators i.e. split up UnixComparatorPartitioner. Also, it might be good to add a Hadoop Streaming test case too.
        Hide
        Devaraj Das added a comment -

        Fixes Arun's comment on splitting the comparator and partitioner. The streaming testcase seems redundant since i already have a testcase for the Java app, and the code path in the comparator would exactly be the same for a streaming app.

        Show
        Devaraj Das added a comment - Fixes Arun's comment on splitting the comparator and partitioner. The streaming testcase seems redundant since i already have a testcase for the Java app, and the code path in the comparator would exactly be the same for a streaming app.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12387221/2302-new.patch
        against trunk revision 681223.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 5 new or modified tests.

        -1 javadoc. The javadoc tool appears to have generated 1 warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        -1 release audit. The applied patch generated 213 release audit warnings (more than the trunk's current 212 warnings).

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2987/testReport/
        Release audit warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2987/artifact/trunk/current/releaseAuditDiffWarnings.txt
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2987/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2987/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2987/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12387221/2302-new.patch against trunk revision 681223. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 5 new or modified tests. -1 javadoc. The javadoc tool appears to have generated 1 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 release audit. The applied patch generated 213 release audit warnings (more than the trunk's current 212 warnings). +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2987/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2987/artifact/trunk/current/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2987/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2987/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2987/console This message is automatically generated.
        Hide
        Arun C Murthy added a comment -

        +1, we just need to fix the javac and audit warnings...

        Show
        Arun C Murthy added a comment - +1, we just need to fix the javac and audit warnings...
        Hide
        Devaraj Das added a comment - - edited

        The release audit warning is fixed. The javadoc warning doesn't make sense to me.

        trunk/src/contrib/streaming/src/java/org/apache/hadoop/streaming/UTF8ByteArrayUtils.java:210: warning - Tag @link: can't find readLine(LineReader, Text) in org.apache.hadoop.streaming.StreamKeyValUtil
        

        The method org.apache.hadoop.streaming.StreamKeyValUtil.readLine with the proper javadoc does exist.

        Show
        Devaraj Das added a comment - - edited The release audit warning is fixed. The javadoc warning doesn't make sense to me. trunk/src/contrib/streaming/src/java/org/apache/hadoop/streaming/UTF8ByteArrayUtils.java:210: warning - Tag @link: can't find readLine(LineReader, Text) in org.apache.hadoop.streaming.StreamKeyValUtil The method org.apache.hadoop.streaming.StreamKeyValUtil.readLine with the proper javadoc does exist.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12387255/2302-new.patch
        against trunk revision 681243.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 5 new or modified tests.

        -1 javadoc. The javadoc tool appears to have generated 1 warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2993/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2993/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2993/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2993/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12387255/2302-new.patch against trunk revision 681243. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 5 new or modified tests. -1 javadoc. The javadoc tool appears to have generated 1 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2993/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2993/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2993/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2993/console This message is automatically generated.
        Hide
        Devaraj Das added a comment -

        Sorry, had attached the incorrect patch earlier.

        Show
        Devaraj Das added a comment - Sorry, had attached the incorrect patch earlier.
        Hide
        Devaraj Das added a comment -

        Submitting to hudson to get the tests through. javadoc is bound to fail for the reason mentioned earlier (pls let me know if anyone has an workaround for this or if i am missing something).

        Show
        Devaraj Das added a comment - Submitting to hudson to get the tests through. javadoc is bound to fail for the reason mentioned earlier (pls let me know if anyone has an workaround for this or if i am missing something).
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12387272/2302-new1.patch
        against trunk revision 681243.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 5 new or modified tests.

        -1 javadoc. The javadoc tool appears to have generated 1 warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2998/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2998/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2998/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2998/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12387272/2302-new1.patch against trunk revision 681243. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 5 new or modified tests. -1 javadoc. The javadoc tool appears to have generated 1 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2998/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2998/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2998/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2998/console This message is automatically generated.
        Hide
        Devaraj Das added a comment -

        Thanks to Sharad for helping me debug the javadoc issue. The issue was that LineReader is an inner class of LineRecordReader and I had to qualify LineReader as LineRecordReader.LineReader in the javadoc link.

        Show
        Devaraj Das added a comment - Thanks to Sharad for helping me debug the javadoc issue. The issue was that LineReader is an inner class of LineRecordReader and I had to qualify LineReader as LineRecordReader.LineReader in the javadoc link.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12387506/2302-new2.patch
        against trunk revision 682978.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 5 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3014/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3014/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3014/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3014/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12387506/2302-new2.patch against trunk revision 682978. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 5 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3014/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3014/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3014/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3014/console This message is automatically generated.
        Hide
        Devaraj Das added a comment -

        I just committed this.

        Show
        Devaraj Das added a comment - I just committed this.
        Hide
        Hudson added a comment -
        Show
        Hudson added a comment - Integrated in Hadoop-trunk #581 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/581/ )
        Hide
        Robert Chansler added a comment -

        Introduced numerical key comparison for streaming. The key specification of the form -k pos1[,pos2], is supported, where,

        • pos is of the form f[.c][opts], where f is the number of the key field to use, and c is the number of the first character from the beginning of the field. Fields and character positions are numbered starting with 1; a character position of zero in pos2 indicates the field's last character. If '.c' is omitted from pos1, it defaults to 1 (the beginning of the field); if omitted from pos2, it defaults to 0 (the end of the field). opts are ordering options. The supported options are:
        • -n, (Sort numerically)
        • -r, (Reverse the result of comparison)
          Multiple key specifications can be provided to sort on multiple key fields (in order). The key fields are assumed to be separated using map.output.key.field.separator. Specifying only -n as an option will make the comparator treat the all key bytes as one field.

        [Is this user guide in the streaming user documentation?]

        Show
        Robert Chansler added a comment - Introduced numerical key comparison for streaming. The key specification of the form -k pos1 [,pos2] , is supported, where, pos is of the form f [.c] [opts] , where f is the number of the key field to use, and c is the number of the first character from the beginning of the field. Fields and character positions are numbered starting with 1; a character position of zero in pos2 indicates the field's last character. If '.c' is omitted from pos1, it defaults to 1 (the beginning of the field); if omitted from pos2, it defaults to 0 (the end of the field). opts are ordering options. The supported options are: -n, (Sort numerically) -r, (Reverse the result of comparison) Multiple key specifications can be provided to sort on multiple key fields (in order). The key fields are assumed to be separated using map.output.key.field.separator. Specifying only -n as an option will make the comparator treat the all key bytes as one field. [Is this user guide in the streaming user documentation?]

          People

          • Assignee:
            Devaraj Das
            Reporter:
            Lohit Vijayarenu
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development