Pig
  1. Pig
  2. PIG-313

Error handling aggregate of a computation

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.2.0
    • Fix Version/s: 0.9.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Query which fails:

      a = load ':INPATH:/singlefile/studenttab10k' as (name:chararray, age:int, gpa:double);
      b = group a by name;
      c = foreach b generate group, SUM(a.age*a.gpa);                            
      store c into ':OUTPATH:';\,
      

      Error output:

      2008-07-14 16:34:08,684 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: testhost.com:8020
      2008-07-14 16:34:08,741 [main] WARN org.apache.hadoop.fs.FileSystem - "testhost.com:8020" is a deprecated filesystem name. Use "hdfs://testhost:8020/" instead.
      2008-07-14 16:34:08,995 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: testhost.com:50020
      2008-07-14 16:34:09,251 [main] WARN org.apache.hadoop.fs.FileSystem - "testhost.com:8020" is a deprecated filesystem name. Use "hdfs://testhost:8020/" instead.
      2008-07-14 16:34:09,559 [main] ERROR org.apache.pig.PigServer - Cannot evaluate output type of Mul/Div Operator
      2008-07-14 16:34:09,559 [main] ERROR org.apache.pig.PigServer - Problem resolving LOForEach schema
      2008-07-14 16:34:09,559 [main] ERROR org.apache.pig.PigServer - Severe problem found during validation org.apache.pig.impl.plan.PlanValidationException: An unexpected exception caused the validation to stop
      2008-07-14 16:34:09,560 [main] ERROR org.apache.pig.tools.grunt.Grunt - java.io.IOException: Unable to store for alias: c
      2008-07-14 16:34:09,560 [main] ERROR org.apache.pig.Main - java.io.IOException: Unable to store for alias: c

        Activity

        Pradeep Kamath created issue -
        Hide
        Pi Song added a comment -

        I think we have discussed about this before and the conclusion is we don't support this.

        Consider this query:-

        b = cogroup a1 by name, a2 by name;
        c = foreach b generate group, SUM(a1.age*a2.gpa);                            
        store c into ':OUTPATH:';\,
        

        This will make it difficult for us because a1.age gives a bag and a2.gpa also gives a bag.
        What is the definition of bag multiplied by bag?

        Show
        Pi Song added a comment - I think we have discussed about this before and the conclusion is we don't support this. Consider this query:- b = cogroup a1 by name, a2 by name; c = foreach b generate group, SUM(a1.age*a2.gpa); store c into ':OUTPATH:';\, This will make it difficult for us because a1.age gives a bag and a2.gpa also gives a bag. What is the definition of bag multiplied by bag?
        Pi Song made changes -
        Field Original Value New Value
        Resolution Won't Fix [ 2 ]
        Status Open [ 1 ] Resolved [ 5 ]
        Hide
        Pradeep Kamath added a comment -

        Per http://wiki.apache.org/pig/PigTypesFunctionalSpec - in the last section: "Argument Construction for Functions" - it says that the computation will be done the fields per tuple in the group and the computed results will be stored into a bag and then supplied to SUM - Is this not going to be the case in this new Pig types release - if not the wiki should be updated.

        Show
        Pradeep Kamath added a comment - Per http://wiki.apache.org/pig/PigTypesFunctionalSpec - in the last section: "Argument Construction for Functions" - it says that the computation will be done the fields per tuple in the group and the computed results will be stored into a bag and then supplied to SUM - Is this not going to be the case in this new Pig types release - if not the wiki should be updated.
        Hide
        Pi Song added a comment -

        That sounds right. Then this is a problem in parser.

        Show
        Pi Song added a comment - That sounds right. Then this is a problem in parser.
        Pi Song made changes -
        Status Resolved [ 5 ] Reopened [ 4 ]
        Resolution Won't Fix [ 2 ]
        Hide
        Pradeep Kamath added a comment -

        Another case of this issue is the following:

        a = load 'singlefile/studenttab10k' as (name, age, gpa);
        b = group a ALL;
        c = foreach b generate SUM((int)(a.age)), MIN((int)(a.age)), MAX((int)(a.age)), AVG((int)(a.age)), MIN((chararray)(a.name)), MAX((chararray)(a.name)), SUM((double)(a.gpa)), MIN((double)(a.gpa)), MAX((double)(a.gpa)), AVG((double)(a.gpa));
        store c into 'outdir';
        

        In this case, the cast fails since it is trying to cast a bag of bytearray to int. However it should really cast each bytearray to int and then supply the bag of ints to SUM() etc.

        Show
        Pradeep Kamath added a comment - Another case of this issue is the following: a = load 'singlefile/studenttab10k' as (name, age, gpa); b = group a ALL; c = foreach b generate SUM(( int )(a.age)), MIN(( int )(a.age)), MAX(( int )(a.age)), AVG(( int )(a.age)), MIN((chararray)(a.name)), MAX((chararray)(a.name)), SUM(( double )(a.gpa)), MIN(( double )(a.gpa)), MAX(( double )(a.gpa)), AVG(( double )(a.gpa)); store c into 'outdir'; In this case, the cast fails since it is trying to cast a bag of bytearray to int. However it should really cast each bytearray to int and then supply the bag of ints to SUM() etc.
        Hide
        Olga Natkovich added a comment -

        Pi is correct - we do not support this right now. One idea we considered for future work is to define + operator on bags to match SQL semantics. Other approaches are also possible.

        Show
        Olga Natkovich added a comment - Pi is correct - we do not support this right now. One idea we considered for future work is to define + operator on bags to match SQL semantics. Other approaches are also possible.
        Olga Natkovich made changes -
        Priority Major [ 3 ] Minor [ 4 ]
        Nigel Daley made changes -
        Fix Version/s 1.0.0 [ 12313288 ]
        Nigel Daley made changes -
        Affects Version/s 0.2.0 [ 12313783 ]
        Affects Version/s 1.0.0 [ 12313288 ]
        Olga Natkovich made changes -
        Fix Version/s 0.9.0 [ 12315191 ]
        Alan Gates made changes -
        Assignee Alan Gates [ alangates ]
        Hide
        Daniel Dai added a comment -

        Run it on trunk, I get a meaningful error message in the front end:
        ERROR 1039: In alias c, incompatible types in Multiplication Operator left hand side:bag right hand side:bag

        Attach a test case to make sure after new TypeChecker, this error message is still there.

        Show
        Daniel Dai added a comment - Run it on trunk, I get a meaningful error message in the front end: ERROR 1039: In alias c, incompatible types in Multiplication Operator left hand side:bag right hand side:bag Attach a test case to make sure after new TypeChecker, this error message is still there.
        Daniel Dai made changes -
        Attachment PIG-313-1.patch [ 12468163 ]
        Hide
        Richard Ding added a comment -

        +1

        Show
        Richard Ding added a comment - +1
        Hide
        Daniel Dai added a comment -

        Review notes:
        https://reviews.apache.org/r/276/

        Patch committed to trunk.

        Show
        Daniel Dai added a comment - Review notes: https://reviews.apache.org/r/276/ Patch committed to trunk.
        Daniel Dai made changes -
        Status Reopened [ 4 ] Resolved [ 5 ]
        Hadoop Flags [Reviewed]
        Resolution Fixed [ 1 ]
        Olga Natkovich made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Resolved Resolved
        1d 12h 5m 1 Pi Song 16/Jul/08 13:25
        Resolved Resolved Reopened Reopened
        5d 47m 1 Pi Song 21/Jul/08 14:13
        Reopened Reopened Resolved Resolved
        917d 5h 32m 1 Daniel Dai 24/Jan/11 18:45
        Resolved Resolved Closed Closed
        191d 5h 49m 1 Olga Natkovich 04/Aug/11 01:34

          People

          • Assignee:
            Alan Gates
            Reporter:
            Pradeep Kamath
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development