Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-927

null should be handled consistently in Join

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.4.0
    • Fix Version/s: 0.6.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Currenlty Pig mostly follows SQL semantics for handling null. However there are certain cases where pig may need to handle nulls correctly. One example is the join - joins on single keys results in null keys not matching to produce an output. However if the join is on >1 keys, in the key tuple, if one of the values is null, it still matches with another key tuple which has a null for that value. We need to decide the right semantics here.

      1. PIG-927-2.patch
        4 kB
        Daniel Dai
      2. PIG-927-1.patch
        4 kB
        Daniel Dai

        Activity

        Hide
        daijy Daniel Dai added a comment -

        Patch committed. Thanks Alan!

        Show
        daijy Daniel Dai added a comment - Patch committed. Thanks Alan!
        Hide
        alangates Alan Gates added a comment -

        Sorry, I missed the \t at the end of the line. Test looks good. +1

        Show
        alangates Alan Gates added a comment - Sorry, I missed the \t at the end of the line. Test looks good. +1
        Hide
        daijy Daniel Dai added a comment -

        Hi, Alan,
        Thank you for your comment. For the test case, if we do not have this patch, "1\t" for input1 will merge with "1\t" for input2, thus "join" will produce 2 output records. With this patch, we can only see 1 output record.

        Show
        daijy Daniel Dai added a comment - Hi, Alan, Thank you for your comment. For the test case, if we do not have this patch, "1\t" for input1 will merge with "1\t" for input2, thus "join" will produce 2 output records. With this patch, we can only see 1 output record.
        Hide
        alangates Alan Gates added a comment -

        The new test doesn't seem to test this case. Other than that the code looks good. Nice comments too, made it easier to understand what was going on.

        Show
        alangates Alan Gates added a comment - The new test doesn't seem to test this case. Other than that the code looks good. Nice comments too, made it easier to understand what was going on.
        Hide
        hadoopqa Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12422271/PIG-927-2.patch
        against trunk revision 825712.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/87/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/87/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/87/console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12422271/PIG-927-2.patch against trunk revision 825712. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/87/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/87/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/87/console This message is automatically generated.
        Hide
        daijy Daniel Dai added a comment -

        Address unit test failure of TestAlgebraicEval. The other unit test failure is due to port conflict of Minicluster, hope it is temporal.

        Show
        daijy Daniel Dai added a comment - Address unit test failure of TestAlgebraicEval. The other unit test failure is due to port conflict of Minicluster, hope it is temporal.
        Hide
        hadoopqa Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12422249/PIG-927-1.patch
        against trunk revision 825393.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/82/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/82/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/82/console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12422249/PIG-927-1.patch against trunk revision 825393. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/82/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/82/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/82/console This message is automatically generated.
        Hide
        daijy Daniel Dai added a comment -

        In the patch, we follow SQL behavior. When we join on more than one key (it is a tuple key in Pig), as long as one of keys is null, we do not merge them. Eg: we do not merge below tuple pair:

        (1, 2, null) vs (1, 2, null)

        Show
        daijy Daniel Dai added a comment - In the patch, we follow SQL behavior. When we join on more than one key (it is a tuple key in Pig), as long as one of keys is null, we do not merge them. Eg: we do not merge below tuple pair: (1, 2, null) vs (1, 2, null)
        Hide
        pkamath Pradeep Kamath added a comment -

        I meant "This is a issue in both map reduce and local mode"

        Show
        pkamath Pradeep Kamath added a comment - I meant "This is a issue in both map reduce and local mode"
        Hide
        pkamath Pradeep Kamath added a comment -

        This is a known issue in both map reduce and local mode.

        Show
        pkamath Pradeep Kamath added a comment - This is a known issue in both map reduce and local mode.
        Hide
        alangates Alan Gates added a comment -

        It seems that the right semantic would be to follow SQL consistently, as that is what we say we do.

        Show
        alangates Alan Gates added a comment - It seems that the right semantic would be to follow SQL consistently, as that is what we say we do.

          People

          • Assignee:
            daijy Daniel Dai
            Reporter:
            pkamath Pradeep Kamath
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development