Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.10.0
    • Fix Version/s: 0.11, 0.10.1
    • Component/s: internal-udfs
    • Labels:
      None
    • Patch Info:
      Patch Available

      Description

      TOBAG only reports an output schema beyond {{

      {(NULL)}

      }} when all input field schemas match deeply, including field schema aliases. This seems wrong to me. Shouldn't it just require recursive type equality?

      For relevant code, see:

      http://svn.apache.org/viewvc/pig/tags/release-0.9.2/src/org/apache/pig/builtin/TOBAG.java?view=markup#l142

          private boolean nullEquals(Schema currentSchema, Schema newSchema) {
              if(currentSchema == null){
                  if(newSchema != null){
                      return false;
                  }
                  return true;
              }
              return currentSchema.equals(newSchema);
          }
      

      The included patch modifies the return line to use Schema.equals(currentSchema, newSchema, false, true) to avoid alias matching requirement.

        Activity

        Hide
        Jonathan Coveney added a comment -

        Committed. Thanks for the submissions, Andy! r1335807 and r1335806

        Show
        Jonathan Coveney added a comment - Committed. Thanks for the submissions, Andy! r1335807 and r1335806
        Hide
        Andy Schlaikjer added a comment -

        Updated patch which includes updates to javadoc.

        Show
        Andy Schlaikjer added a comment - Updated patch which includes updates to javadoc.
        Hide
        Andy Schlaikjer added a comment -

        I totally missed that bit. Thanks for pasting the example. I'll update the patch to include changes to the javadoc. Let me know if you like how it reads.

        Show
        Andy Schlaikjer added a comment - I totally missed that bit. Thanks for pasting the example. I'll update the patch to include changes to the javadoc. Let me know if you like how it reads.
        Hide
        Jonathan Coveney added a comment -

        This comes from the current comments on TOBAG:

         67  *  example 3
         68  *  grunt> describe a;                                                             
         69  *  a: {a0: (x: int),a1: (y: int)}
         70  * -- note that the inner schema is different because the alises (x & y) are different
         71  *  grunt> b = foreach a generate TOBAG(a0,a1);                                    
         72  *  grunt> describe b;                                                             
         73  *  b: {{NULL}}
        

        your patch changes that. It's an ok change IMHO, since it is a more permissive change and the old version would have forced people to manually cast if the schema mattered, but we still need to update the comments accordingly, as in the above example, it'd now be b: (x:int).

        Show
        Jonathan Coveney added a comment - This comes from the current comments on TOBAG: 67 * example 3 68 * grunt> describe a; 69 * a: {a0: (x: int ),a1: (y: int )} 70 * -- note that the inner schema is different because the alises (x & y) are different 71 * grunt> b = foreach a generate TOBAG(a0,a1); 72 * grunt> describe b; 73 * b: {{NULL}} your patch changes that. It's an ok change IMHO, since it is a more permissive change and the old version would have forced people to manually cast if the schema mattered, but we still need to update the comments accordingly, as in the above example, it'd now be b: (x:int) .
        Hide
        Andy Schlaikjer added a comment -

        Hi Jon, Thanks for the test! As for the documented semantics of the UDF, they haven't really changed-- I've just relaxed the constraint on alias matching. If there's a mismatch on types at any level, the same null output schema will be returned.

        Show
        Andy Schlaikjer added a comment - Hi Jon, Thanks for the test! As for the documented semantics of the UDF, they haven't really changed-- I've just relaxed the constraint on alias matching. If there's a mismatch on types at any level, the same null output schema will be returned.
        Hide
        Jonathan Coveney added a comment -

        Andy,

        I am testing this now and it looks good. One thing you still need to do is update the comments in TOBAG, because it explicitly makes note of the fact that if the schema of the tuples does not agree, that it will return a null schema. Now, it will take the first one.

        Thanks,
        Jon

        Show
        Jonathan Coveney added a comment - Andy, I am testing this now and it looks good. One thing you still need to do is update the comments in TOBAG, because it explicitly makes note of the fact that if the schema of the tuples does not agree, that it will return a null schema. Now, it will take the first one. Thanks, Jon
        Hide
        Jonathan Coveney added a comment -

        This looks good to me. I will try to commit it this weekend if I have a chance to run test-commit against 0.10 and 0.11 (I am still in Asia).

        Show
        Jonathan Coveney added a comment - This looks good to me. I will try to commit it this weekend if I have a chance to run test-commit against 0.10 and 0.11 (I am still in Asia).

          People

          • Assignee:
            Unassigned
            Reporter:
            Andy Schlaikjer
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development