Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-2530

Reusing alias name in nested foreach causes incorrect results

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 0.9.2, 0.10.0
    • Fix Version/s: 0.10.0, 0.9.3, 0.11
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      The below script results in incorrect output for Pig 0.10 but runs fine with Pig 0.8,

      input.txt
      1	4
      1	3
      2	3
      2	4
      
      a = load 'input.txt' as (v1:int, v2:int);
      b = group a by v1;
      c = foreach b { x = a; x = order x by v2 asc; generate flatten(x); }
      store c into 'c1';
      

      Output from Pig 0.10
      --------------------
      1 4
      1 3
      2 3
      2 4

      Looking at the explain, it seems like the sorting is entirely missed out.
      The script produces correct results if I change the alias name ie;

      c = foreach b

      { x = a; x1 = order x by v2 asc; generate flatten(x1); }

        Attachments

        1. PIG-2530-0.patch
          4 kB
          Daniel Dai
        2. PIG-2530-1.patch
          6 kB
          Thomas Weise
        3. PIG-2530-2.patch
          2 kB
          Daniel Dai

          Activity

            People

            • Assignee:
              daijy Daniel Dai
              Reporter:
              vivekp Vivek Padmanabhan
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: