Pig
  1. Pig
  2. PIG-1724

Multiquery optimization miscalculates the parallelism and results in extra 0 bytes files (Pig 0.7 and 0.8)

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: 0.7.0, 0.8.0
    • Fix Version/s: 0.7.0, 0.8.0
    • Component/s: impl
    • Labels:
      None

      Description

      We have found an issue with Pig 0.8 and Pig 0.7 when using Multiquery optimization. It produces more number of part files than required. Please observe that the GROUP ALL is a dummy in this case.

      record002 = LOAD 'samplepig001.in' AS (id:chararray,num:int);
      f_records002= FILTER record002 BY num!=50000;
      group01 = GROUP f_records002 ALL PARALLEL 1;
      STORE group01 INTO 'pig_out_direc_SET1';
      
      
      set2 = FILTER f_records002 BY num!=200002;
      set2_Group = GROUP set2 ALL PARALLEL 1;
      STORE set2 INTO 'pig_out_direc_SET2';
      
      set3 = FILTER f_records002 BY num!=100001;
      set3_Group= GROUP set3 BY id PARALLEL 40;
      --set3_Rec4= FILTER set3_Group by num!=5000000;
      STORE set3_Group INTO 'pig_out_direc_SET3';
      

      When run in Pig 0.8 it results in the following output.

      $ hadoop fs -ls /user/viraj/pig_out_direc_SET1
      ...
      Found 40 items
      rw------- 3 viraj users 0 2010-11-13 02:09 /user/viraj/pig_out_direc_SET1/part-r-00000
      ...
      ...
      rw------ 3 viraj users 0 2010-11-13 02:09 /user/viraj/pig_out_direc_SET1/part-r-00039

      $ hadoop fs -ls /user/viraj/pig_out_direc_SET2
      Found 1 items
      rw------ 3 viraj users 110 2010-11-13 02:08 /user/viraj/pig_out_direc_SET2/part-m-00000

      $ hadoop fs -ls /user/viraj/pig_out_direc_SET3
      Found 40 items
      rw------ 3 viraj users 0 2010-11-13 02:09 /user/viraj/pig_out_direc_SET3/part-r-00000
      ...
      ...
      rw------ 3 viraj users 0 2010-11-13 02:09 /user/viraj/pig_out_direc_SET3/part-r-00039

      Viraj

      1. samplepig001.in
        0.1 kB
        Viraj Bhat

        Activity

        Olga Natkovich made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Olga Natkovich made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Won't Fix [ 2 ]
        Olga Natkovich made changes -
        Assignee Richard Ding [ rding ]
        Viraj Bhat made changes -
        Attachment samplepig001.in [ 12459513 ]
        Viraj Bhat made changes -
        Field Original Value New Value
        Fix Version/s 0.7.0 [ 12314397 ]
        Affects Version/s 0.8.0 [ 12314562 ]
        Viraj Bhat created issue -

          People

          • Assignee:
            Richard Ding
            Reporter:
            Viraj Bhat
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development