Pig
  1. Pig
  2. PIG-1195

POSort should take care of sort order

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.6.0
    • Fix Version/s: 0.6.0
    • Component/s: impl
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Patch committed to both trunk and 0.6 branch.

      Description

      POSort always use ascending order. We shall obey the sort order as specified in the script.

      For example, the following script does not do the right thing if we turn off secondary sort (which means, we will rely on POSort to sort):

      A = load 'input' as (a0:int);
      B = group A ALL;
      C = foreach B {
          D = order A by a0 desc;
          generate D;
      };
      dump C;
      

      If we run it using the command line "java -Xmx512m -Dpig.exec.nosecondarykey=true -jar pig.jar 1.pig".

      The sort order for D is ascending.

      1. PIG-1195-1.patch
        6 kB
        Daniel Dai
      2. PIG-1195-2.patch
        6 kB
        Daniel Dai
      3. PIG-1195-3.patch
        2 kB
        Daniel Dai
      4. PIG-1195-4.patch
        2 kB
        Daniel Dai

        Activity

        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12430787/PIG-1195-3.patch
        against trunk revision 900926.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/184/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/184/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/184/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12430787/PIG-1195-3.patch against trunk revision 900926. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/184/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/184/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/184/console This message is automatically generated.
        Hide
        Pradeep Kamath added a comment -

        +1 for commit

        Show
        Pradeep Kamath added a comment - +1 for commit
        Hide
        Daniel Dai added a comment -

        Use createInputFile instead of generateURI in test as per Pradeep's comment.

        Show
        Daniel Dai added a comment - Use createInputFile instead of generateURI in test as per Pradeep's comment.
        Hide
        Pradeep Kamath added a comment -

        One comment:

        • In unit test I would recommend you use Util.createInputFile() method which take the minicluster as an input arg and creates a input file on the cluster. Also delete the file in a finally using Util.deleteFile() - the Util.generateURI() is something which will create problems while merging to load-store-redesign branch.
          Otherwise +1
        Show
        Pradeep Kamath added a comment - One comment: In unit test I would recommend you use Util.createInputFile() method which take the minicluster as an input arg and creates a input file on the cluster. Also delete the file in a finally using Util.deleteFile() - the Util.generateURI() is something which will create problems while merging to load-store-redesign branch. Otherwise +1
        Hide
        Ying He added a comment -

        +1

        Show
        Ying He added a comment - +1
        Hide
        Daniel Dai added a comment -

        Thanks Alan. Actually after talking with Ying, I realize originally POSort use mComparator to do the sorting. This code is broken in the current code, we should fix this rather than introduce something new.

        Show
        Daniel Dai added a comment - Thanks Alan. Actually after talking with Ying, I realize originally POSort use mComparator to do the sorting. This code is broken in the current code, we should fix this rather than introduce something new.
        Hide
        Alan Gates added a comment -

        The sorting algorithm in DefaultComparator does not match the sorting algorithm in DefaultTuple.compare.

        The algorithm used here first compares the values of each column, and only considers the overall size of the tuples once one tuple has run out of fields. The algorithm used in DefaultTuple.compare first compares tuple size, then individual column values. So in this algorithm (5, 3) > (4, 3, 1), but in DefaultTuple's algorithm (5, 3) < (4, 3, 1). We should use the same algorithm in both places.

        Show
        Alan Gates added a comment - The sorting algorithm in DefaultComparator does not match the sorting algorithm in DefaultTuple.compare. The algorithm used here first compares the values of each column, and only considers the overall size of the tuples once one tuple has run out of fields. The algorithm used in DefaultTuple.compare first compares tuple size, then individual column values. So in this algorithm (5, 3) > (4, 3, 1), but in DefaultTuple's algorithm (5, 3) < (4, 3, 1). We should use the same algorithm in both places.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12430532/PIG-1195-2.patch
        against trunk revision 899502.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/182/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/182/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/182/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12430532/PIG-1195-2.patch against trunk revision 899502. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/182/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/182/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/182/console This message is automatically generated.
        Hide
        Daniel Dai added a comment -

        Change the comparison logic a little bit.

        Show
        Daniel Dai added a comment - Change the comparison logic a little bit.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12430478/PIG-1195-1.patch
        against trunk revision 899502.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/179/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/179/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/179/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12430478/PIG-1195-1.patch against trunk revision 899502. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/179/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/179/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/179/console This message is automatically generated.

          People

          • Assignee:
            Daniel Dai
            Reporter:
            Daniel Dai
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development