Pig
  1. Pig
  2. PIG-313

Error handling aggregate of a computation

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.2.0
    • Fix Version/s: 0.9.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Query which fails:

      a = load ':INPATH:/singlefile/studenttab10k' as (name:chararray, age:int, gpa:double);
      b = group a by name;
      c = foreach b generate group, SUM(a.age*a.gpa);                            
      store c into ':OUTPATH:';\,
      

      Error output:

      2008-07-14 16:34:08,684 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: testhost.com:8020
      2008-07-14 16:34:08,741 [main] WARN org.apache.hadoop.fs.FileSystem - "testhost.com:8020" is a deprecated filesystem name. Use "hdfs://testhost:8020/" instead.
      2008-07-14 16:34:08,995 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: testhost.com:50020
      2008-07-14 16:34:09,251 [main] WARN org.apache.hadoop.fs.FileSystem - "testhost.com:8020" is a deprecated filesystem name. Use "hdfs://testhost:8020/" instead.
      2008-07-14 16:34:09,559 [main] ERROR org.apache.pig.PigServer - Cannot evaluate output type of Mul/Div Operator
      2008-07-14 16:34:09,559 [main] ERROR org.apache.pig.PigServer - Problem resolving LOForEach schema
      2008-07-14 16:34:09,559 [main] ERROR org.apache.pig.PigServer - Severe problem found during validation org.apache.pig.impl.plan.PlanValidationException: An unexpected exception caused the validation to stop
      2008-07-14 16:34:09,560 [main] ERROR org.apache.pig.tools.grunt.Grunt - java.io.IOException: Unable to store for alias: c
      2008-07-14 16:34:09,560 [main] ERROR org.apache.pig.Main - java.io.IOException: Unable to store for alias: c

        Activity

        Hide
        Daniel Dai added a comment -

        Review notes:
        https://reviews.apache.org/r/276/

        Patch committed to trunk.

        Show
        Daniel Dai added a comment - Review notes: https://reviews.apache.org/r/276/ Patch committed to trunk.
        Hide
        Richard Ding added a comment -

        +1

        Show
        Richard Ding added a comment - +1
        Hide
        Daniel Dai added a comment -

        Run it on trunk, I get a meaningful error message in the front end:
        ERROR 1039: In alias c, incompatible types in Multiplication Operator left hand side:bag right hand side:bag

        Attach a test case to make sure after new TypeChecker, this error message is still there.

        Show
        Daniel Dai added a comment - Run it on trunk, I get a meaningful error message in the front end: ERROR 1039: In alias c, incompatible types in Multiplication Operator left hand side:bag right hand side:bag Attach a test case to make sure after new TypeChecker, this error message is still there.
        Hide
        Olga Natkovich added a comment -

        Pi is correct - we do not support this right now. One idea we considered for future work is to define + operator on bags to match SQL semantics. Other approaches are also possible.

        Show
        Olga Natkovich added a comment - Pi is correct - we do not support this right now. One idea we considered for future work is to define + operator on bags to match SQL semantics. Other approaches are also possible.
        Hide
        Pradeep Kamath added a comment -

        Another case of this issue is the following:

        a = load 'singlefile/studenttab10k' as (name, age, gpa);
        b = group a ALL;
        c = foreach b generate SUM((int)(a.age)), MIN((int)(a.age)), MAX((int)(a.age)), AVG((int)(a.age)), MIN((chararray)(a.name)), MAX((chararray)(a.name)), SUM((double)(a.gpa)), MIN((double)(a.gpa)), MAX((double)(a.gpa)), AVG((double)(a.gpa));
        store c into 'outdir';
        

        In this case, the cast fails since it is trying to cast a bag of bytearray to int. However it should really cast each bytearray to int and then supply the bag of ints to SUM() etc.

        Show
        Pradeep Kamath added a comment - Another case of this issue is the following: a = load 'singlefile/studenttab10k' as (name, age, gpa); b = group a ALL; c = foreach b generate SUM(( int )(a.age)), MIN(( int )(a.age)), MAX(( int )(a.age)), AVG(( int )(a.age)), MIN((chararray)(a.name)), MAX((chararray)(a.name)), SUM(( double )(a.gpa)), MIN(( double )(a.gpa)), MAX(( double )(a.gpa)), AVG(( double )(a.gpa)); store c into 'outdir'; In this case, the cast fails since it is trying to cast a bag of bytearray to int. However it should really cast each bytearray to int and then supply the bag of ints to SUM() etc.
        Hide
        Pi Song added a comment -

        That sounds right. Then this is a problem in parser.

        Show
        Pi Song added a comment - That sounds right. Then this is a problem in parser.
        Hide
        Pradeep Kamath added a comment -

        Per http://wiki.apache.org/pig/PigTypesFunctionalSpec - in the last section: "Argument Construction for Functions" - it says that the computation will be done the fields per tuple in the group and the computed results will be stored into a bag and then supplied to SUM - Is this not going to be the case in this new Pig types release - if not the wiki should be updated.

        Show
        Pradeep Kamath added a comment - Per http://wiki.apache.org/pig/PigTypesFunctionalSpec - in the last section: "Argument Construction for Functions" - it says that the computation will be done the fields per tuple in the group and the computed results will be stored into a bag and then supplied to SUM - Is this not going to be the case in this new Pig types release - if not the wiki should be updated.
        Hide
        Pi Song added a comment -

        I think we have discussed about this before and the conclusion is we don't support this.

        Consider this query:-

        b = cogroup a1 by name, a2 by name;
        c = foreach b generate group, SUM(a1.age*a2.gpa);                            
        store c into ':OUTPATH:';\,
        

        This will make it difficult for us because a1.age gives a bag and a2.gpa also gives a bag.
        What is the definition of bag multiplied by bag?

        Show
        Pi Song added a comment - I think we have discussed about this before and the conclusion is we don't support this. Consider this query:- b = cogroup a1 by name, a2 by name; c = foreach b generate group, SUM(a1.age*a2.gpa); store c into ':OUTPATH:';\, This will make it difficult for us because a1.age gives a bag and a2.gpa also gives a bag. What is the definition of bag multiplied by bag?

          People

          • Assignee:
            Alan Gates
            Reporter:
            Pradeep Kamath
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development