Pig
  1. Pig
  2. PIG-834

incorrect plan when algebraic functions are nested

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.7.0
    • Component/s: impl
    • Labels:
      None

      Description

      a = load 'students.txt' as (c1,c2,c3,c4);
      c = group a by c2;
      f = foreach c generate COUNT(org.apache.pig.builtin.Distinct($1.$2));

      Notice that Distinct udf is missing in Combiner and reduce stage. As a result distinct does not function, and incorrect results are produced.
      Distinct should have been evaluated in the 3 stages and output of Distinct should be given to COUNT in reduce stage.

      # Map Reduce Plan                                  
      #--------------------------------------------------
      MapReduce node 1-122
      Map Plan
      Local Rearrange[tuple]{bytearray}(false) - 1-139
      |   |
      |   Project[bytearray][1] - 1-140
      |
      |---New For Each(false,false)[bag] - 1-127
          |   |
          |   POUserFunc(org.apache.pig.builtin.COUNT$Initial)[tuple] - 1-125
          |   |
          |   |---POUserFunc(org.apache.pig.builtin.Distinct)[bag] - 1-126
          |       |
          |       |---Project[bag][2] - 1-123
          |           |
          |           |---Project[bag][1] - 1-124
          |   |
          |   Project[bytearray][0] - 1-133
          |
          |---Pre Combiner Local Rearrange[tuple]{Unknown} - 1-141
              |
              |---Load(hdfs://wilbur11.labs.corp.sp1.yahoo.com/user/tejas/students.txt:org.apache.pig.builtin.PigStorage) - 1-111--------
      Combine Plan
      Local Rearrange[tuple]{bytearray}(false) - 1-143
      |   |
      |   Project[bytearray][1] - 1-144
      |
      |---New For Each(false,false)[bag] - 1-132
          |   |
          |   POUserFunc(org.apache.pig.builtin.COUNT$Intermediate)[tuple] - 1-130
          |   |
          |   |---Project[bag][0] - 1-135
          |   |
          |   Project[bytearray][1] - 1-134
          |
          |---POCombinerPackage[tuple]{bytearray} - 1-137--------
      Reduce Plan
      Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-121
      |
      |---New For Each(false)[bag] - 1-120
          |   |
          |   POUserFunc(org.apache.pig.builtin.COUNT$Final)[long] - 1-119
          |   |
          |   |---Project[bag][0] - 1-136
          |
          |---POCombinerPackage[tuple]{bytearray} - 1-145--------
      Global sort: false
      
      1. pig-834.patch
        3 kB
        Ashutosh Chauhan
      2. pig-834_2.patch
        5 kB
        Ashutosh Chauhan
      3. pig-834_3.patch
        5 kB
        Ashutosh Chauhan

        Activity

        Thejas M Nair created issue -
        Thejas M Nair made changes -
        Field Original Value New Value
        Description a = load 'students.txt' as (c1,c2,c3,c4);
        c = group a by c2;
        f = foreach c generate COUNT(org.apache.pig.builtin.Distinct($1.$2));

        Notice that Distinct udf is missing in Combiner and reduce stage. As a result distinct does not function, and incorrect results are produced.
        Distinct should have been evaluated in the 3 stages and output of Distinct should be given to COUNT in reduce stage.


        # Map Reduce Plan
        #--------------------------------------------------
        MapReduce node 1-122
        Map Plan
        Local Rearrange[tuple]{bytearray}(false) - 1-139
        | |
        | Project[bytearray][1] - 1-140
        |
        |---New For Each(false,false)[bag] - 1-127
            | |
            | POUserFunc(org.apache.pig.builtin.COUNT$Initial)[tuple] - 1-125
            | |
            | |---POUserFunc(org.apache.pig.builtin.Distinct)[bag] - 1-126
            | |
            | |---Project[bag][2] - 1-123
            | |
            | |---Project[bag][1] - 1-124
            | |
            | Project[bytearray][0] - 1-133
            |
            |---Pre Combiner Local Rearrange[tuple]{Unknown} - 1-141
                |
                |---Load(hdfs://wilbur11.labs.corp.sp1.yahoo.com/user/tejas/students.txt:org.apache.pig.builtin.PigStorage) - 1-111--------
        Combine Plan
        Local Rearrange[tuple]{bytearray}(false) - 1-143
        | |
        | Project[bytearray][1] - 1-144
        |
        |---New For Each(false,false)[bag] - 1-132
            | |
            | POUserFunc(org.apache.pig.builtin.COUNT$Intermediate)[tuple] - 1-130
            | |
            | |---Project[bag][0] - 1-135
            | |
            | Project[bytearray][1] - 1-134
            |
            |---POCombinerPackage[tuple]{bytearray} - 1-137--------
        Reduce Plan
        Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-121
        |
        |---New For Each(false)[bag] - 1-120
            | |
            | POUserFunc(org.apache.pig.builtin.COUNT$Final)[long] - 1-119
            | |
            | |---Project[bag][0] - 1-136
            |
            |---POCombinerPackage[tuple]{bytearray} - 1-145--------
        Global sort: false
        a = load 'students.txt' as (c1,c2,c3,c4);
        c = group a by c2;
        f = foreach c generate COUNT(org.apache.pig.builtin.Distinct($1.$2));

        Notice that Distinct udf is missing in Combiner and reduce stage. As a result distinct does not function, and incorrect results are produced.
        Distinct should have been evaluated in the 3 stages and output of Distinct should be given to COUNT in reduce stage.

        {code}
        # Map Reduce Plan
        #--------------------------------------------------
        MapReduce node 1-122
        Map Plan
        Local Rearrange[tuple]{bytearray}(false) - 1-139
        | |
        | Project[bytearray][1] - 1-140
        |
        |---New For Each(false,false)[bag] - 1-127
            | |
            | POUserFunc(org.apache.pig.builtin.COUNT$Initial)[tuple] - 1-125
            | |
            | |---POUserFunc(org.apache.pig.builtin.Distinct)[bag] - 1-126
            | |
            | |---Project[bag][2] - 1-123
            | |
            | |---Project[bag][1] - 1-124
            | |
            | Project[bytearray][0] - 1-133
            |
            |---Pre Combiner Local Rearrange[tuple]{Unknown} - 1-141
                |
                |---Load(hdfs://wilbur11.labs.corp.sp1.yahoo.com/user/tejas/students.txt:org.apache.pig.builtin.PigStorage) - 1-111--------
        Combine Plan
        Local Rearrange[tuple]{bytearray}(false) - 1-143
        | |
        | Project[bytearray][1] - 1-144
        |
        |---New For Each(false,false)[bag] - 1-132
            | |
            | POUserFunc(org.apache.pig.builtin.COUNT$Intermediate)[tuple] - 1-130
            | |
            | |---Project[bag][0] - 1-135
            | |
            | Project[bytearray][1] - 1-134
            |
            |---POCombinerPackage[tuple]{bytearray} - 1-137--------
        Reduce Plan
        Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-121
        |
        |---New For Each(false)[bag] - 1-120
            | |
            | POUserFunc(org.apache.pig.builtin.COUNT$Final)[long] - 1-119
            | |
            | |---Project[bag][0] - 1-136
            |
            |---POCombinerPackage[tuple]{bytearray} - 1-145--------
        Global sort: false
        {code}
        Thejas M Nair made changes -
        Fix Version/s 0.7.0 [ 12314397 ]
        Olga Natkovich made changes -
        Priority Critical [ 2 ] Major [ 3 ]
        Olga Natkovich made changes -
        Assignee Ashutosh Chauhan [ ashutoshc ]
        Ashutosh Chauhan made changes -
        Attachment pig-834.patch [ 12434920 ]
        Ashutosh Chauhan made changes -
        Attachment pig-834_2.patch [ 12435027 ]
        Ashutosh Chauhan made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Ashutosh Chauhan made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Ashutosh Chauhan made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Ashutosh Chauhan made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Ashutosh Chauhan made changes -
        Attachment pig-834_3.patch [ 12435493 ]
        Ashutosh Chauhan made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Ashutosh Chauhan made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Daniel Dai made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Ashutosh Chauhan
            Reporter:
            Thejas M Nair
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development