Pig
  1. Pig
  2. PIG-2119

DuplicateForEachColumnRewrite makes assumptions about the position of LOGGenerate in the plan

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.9.0, 0.9.1
    • Fix Version/s: 0.9.2, 0.10.0, 0.11
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      The input:

      grunt> cat b.txt
      a       11
      b       3
      c       10
      a       12
      b       10
      c       15
      

      The script:

      a = load 'b.txt' AS (id:chararray, num:int);
      b = group a by id;
      c = foreach b { 
        d = order a by num DESC;
        n = COUNT(a);
        e = limit d 1;
        generate n;
      }
      

      The exception:

      Caused by: java.lang.ClassCastException: org.apache.pig.newplan.logical.relational.LOLimit cannot be cast to org.apache.pig.newplan.logical.relational.LOGenerate
              at org.apache.pig.newplan.logical.rules.DuplicateForEachColumnRewrite$DuplicateForEachColumnRewriteTransformer.check(DuplicateForEachColumnRewrite.java:87)
              at org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:108)
      
      

      I know the script is a bit pointless, but I was just testing and modifying the script bit by bit.
      If I remove the limit in any case I get the same exception but with LOSort.

      The problem, I think, is that the rule assumes there is only 1 sink in the nested block and that this sink is a LOGenerate.

        Activity

        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Resolved Resolved
        142d 18h 2m 1 Daniel Dai 02/Nov/11 06:19
        Resolved Resolved Closed Closed
        82d 1h 11m 1 Daniel Dai 23/Jan/12 07:31
        Daniel Dai made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Hide
        Daniel Dai added a comment -

        Patch committed to 0.9 branch as per Dmitriy's request (PIG-2474)

        Show
        Daniel Dai added a comment - Patch committed to 0.9 branch as per Dmitriy's request ( PIG-2474 )
        Daniel Dai made changes -
        Fix Version/s 0.9.2 [ 12318248 ]
        Daniel Dai made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Hadoop Flags Reviewed [ 10343 ]
        Fix Version/s 0.11 [ 12318878 ]
        Resolution Fixed [ 1 ]
        Hide
        Daniel Dai added a comment -

        Unit tests pass. Test-patch:
        [exec] -1 overall.
        [exec]
        [exec] +1 @author. The patch does not contain any @author tags.
        [exec]
        [exec] +1 tests included. The patch appears to include 6 new or modified tests.
        [exec]
        [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
        [exec]
        [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
        [exec]
        [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
        [exec]
        [exec] -1 release audit. The applied patch generated 458 release audit warnings (more than the trunk's current 447 warnings).

        All new files has proper header.

        Patch committed to both trunk and 0.10 branch.

        Show
        Daniel Dai added a comment - Unit tests pass. Test-patch: [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 6 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] -1 release audit. The applied patch generated 458 release audit warnings (more than the trunk's current 447 warnings). All new files has proper header. Patch committed to both trunk and 0.10 branch.
        Thejas M Nair made changes -
        Affects Version/s 0.9.0 [ 12315191 ]
        Affects Version/s 0.9.1 [ 12317343 ]
        Hide
        Thejas M Nair added a comment -

        +1

        Show
        Thejas M Nair added a comment - +1
        Daniel Dai made changes -
        Attachment PIG-2119-1.patch [ 12501124 ]
        Daniel Dai made changes -
        Attachment PIG-2119-1.patch [ 12501123 ]
        Daniel Dai made changes -
        Attachment PIG-2119-1.patch [ 12501123 ]
        Daniel Dai made changes -
        Attachment PIG-2119-1.patch [ 12501122 ]
        Daniel Dai made changes -
        Assignee Daniel Dai [ daijy ]
        Daniel Dai made changes -
        Attachment PIG-2119-1.patch [ 12501122 ]
        Olga Natkovich made changes -
        Field Original Value New Value
        Fix Version/s 0.10 [ 12316246 ]
        Hide
        Vivek Padmanabhan added a comment -

        Faced this issue with the below script;

        A = load '3char_1long_tab' as (f1:chararray, f2:chararray, f3:chararray,ct:long);
        B = GROUP A  BY f1;
        C =    FOREACH B {
                zip_ordered = ORDER A BY f3 ASC; 
                GENERATE
                        FLATTEN(group) AS f1,	
                        A.(f3, ct),
        		--COUNT(zip_ordered),
                        SUM(A.ct) AS total;
          };
        
        dump C;
        

        The zip_ordered is an accident and not used, but Pig 0.8 silently ignores this while Pig 0.9 throws exception.
        I believe the affect version should be 0.9

        Show
        Vivek Padmanabhan added a comment - Faced this issue with the below script; A = load '3char_1long_tab' as (f1:chararray, f2:chararray, f3:chararray,ct: long ); B = GROUP A BY f1; C = FOREACH B { zip_ordered = ORDER A BY f3 ASC; GENERATE FLATTEN(group) AS f1, A.(f3, ct), --COUNT(zip_ordered), SUM(A.ct) AS total; }; dump C; The zip_ordered is an accident and not used, but Pig 0.8 silently ignores this while Pig 0.9 throws exception. I believe the affect version should be 0.9
        Hide
        Daniel Dai added a comment -

        LOGenerate is the only sink in nested plan. This is one of the basic assumption we made throughout logical plan. To solve the issue above, we need to prune the dangling branch before proceed.

        Show
        Daniel Dai added a comment - LOGenerate is the only sink in nested plan. This is one of the basic assumption we made throughout logical plan. To solve the issue above, we need to prune the dangling branch before proceed.
        Gianmarco De Francisci Morales created issue -

          People

          • Assignee:
            Daniel Dai
            Reporter:
            Gianmarco De Francisci Morales
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development