Pig
  1. Pig
  2. PIG-1277

Pig should give error message when cogroup on tuple keys of different inner type

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.6.0
    • Fix Version/s: 0.9.0
    • Component/s: impl
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      When we cogroup on a tuple, if the inner type of tuple does not match, we treat them as different keys. This is confusing. It is desirable to give error/warnings when it happens.

      Here is one example:
      UDF:

      public class MapGenerate extends EvalFunc<Map> {
          @Override
          public Map exec(Tuple input) throws IOException {
              // TODO Auto-generated method stub
              Map m = new HashMap();
              m.put("key", new Integer(input.size()));
              return m;
          }
          
          @Override
          public Schema outputSchema(Schema input) {
              return new Schema(new Schema.FieldSchema(null, DataType.MAP));
          }
      }
      

      Pig script:

      a = load '1.txt' as (a0);
      b = foreach a generate a0, MapGenerate(*) as m:map[];
      c = foreach b generate a0, m#'key' as key;
      d = load '2.txt' as (c0, c1);
      e = cogroup c by (a0, key), d by (c0, c1);
      dump e;
      

      1.txt

      1
      

      2.txt

      1 1
      

      User expected result (which is not right):

      ((1,1),{(1,1)},{(1,1)})
      

      Real result:

      ((1,1),{(1,1)},{})
      ((1,1),{},{(1,1)})
      

      We shall give user the message that we can not merge the key due to the type mismatch.

      1. PIG-1277-1.patch
        8 kB
        Daniel Dai
      2. PIG-1277-2.patch
        8 kB
        Daniel Dai
      3. PIG-1277-3.patch
        18 kB
        Daniel Dai

        Issue Links

          Activity

          Hide
          Daniel Dai added a comment -

          The patch also aim to solve PIG-999, PIG-1065

          Show
          Daniel Dai added a comment - The patch also aim to solve PIG-999 , PIG-1065
          Hide
          Alan Gates added a comment -

          Comments:

          1. I'm not sure the null handling in NullableBytesWritable.getValueAsPigType is the same. Previously it would check specifically if the value had been marked as null. Now it looks like if there isn't an entry in the first slot of the tuple (which I think would be what would happen if it were null) it will throw an exception. I think you want to return the isNull check, and make sure the constructors properly set that value.
          2. I don't understand the change in LOUnion.
          Show
          Alan Gates added a comment - Comments: I'm not sure the null handling in NullableBytesWritable.getValueAsPigType is the same. Previously it would check specifically if the value had been marked as null. Now it looks like if there isn't an entry in the first slot of the tuple (which I think would be what would happen if it were null) it will throw an exception. I think you want to return the isNull check, and make sure the constructors properly set that value. I don't understand the change in LOUnion.
          Hide
          Daniel Dai added a comment -

          Response to Alan's comments:
          1. Yes, you are right. It introduces some subtle differences. When we see a null value, we set mNull flag, and put null into a tuple. We do read the null back; however, in NullableBytesWritable, we rely on mNull flag to do the comparison, which may results wrong result. I will change it.

          2. LOUnion is a bug fix. We shall get null schema if union two different schema. It is to fix PIG-1065. Since PIG-1065 is more of the same nature, I don't want to put fix in a separate patch.

          Show
          Daniel Dai added a comment - Response to Alan's comments: 1. Yes, you are right. It introduces some subtle differences. When we see a null value, we set mNull flag, and put null into a tuple. We do read the null back; however, in NullableBytesWritable, we rely on mNull flag to do the comparison, which may results wrong result. I will change it. 2. LOUnion is a bug fix. We shall get null schema if union two different schema. It is to fix PIG-1065 . Since PIG-1065 is more of the same nature, I don't want to put fix in a separate patch.
          Hide
          Daniel Dai added a comment -

          PIG-1277-2.patch address Alan's review comments.

          Show
          Daniel Dai added a comment - PIG-1277 -2.patch address Alan's review comments.
          Hide
          Alan Gates added a comment -

          +1

          Show
          Alan Gates added a comment - +1
          Hide
          Daniel Dai added a comment -

          test-patch result:
          [exec] -1 overall.
          [exec]
          [exec] +1 @author. The patch does not contain any @author tags.
          [exec]
          [exec] +1 tests included. The patch appears to include 9 new or modified tests.
          [exec]
          [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
          [exec]
          [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
          [exec]
          [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
          [exec]
          [exec] -1 release audit. The applied patch generated 468 release audit warnings (more than the trunk's current 464 warnings).

          Release audit warning is because changed NullableBytesWritable construct triggered a new jdiff file.

          Unit test:
          all pass

          end-to-end test:
          all pass

          Show
          Daniel Dai added a comment - test-patch result: [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 9 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] -1 release audit. The applied patch generated 468 release audit warnings (more than the trunk's current 464 warnings). Release audit warning is because changed NullableBytesWritable construct triggered a new jdiff file. Unit test: all pass end-to-end test: all pass
          Hide
          Daniel Dai added a comment -

          Patch committed.

          Show
          Daniel Dai added a comment - Patch committed.

            People

            • Assignee:
              Alan Gates
              Reporter:
              Daniel Dai
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development