Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-1724

Multiquery optimization miscalculates the parallelism and results in extra 0 bytes files (Pig 0.7 and 0.8)

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Won't Fix
    • 0.7.0, 0.8.0
    • 0.7.0, 0.8.0
    • impl
    • None

    Description

      We have found an issue with Pig 0.8 and Pig 0.7 when using Multiquery optimization. It produces more number of part files than required. Please observe that the GROUP ALL is a dummy in this case.

      record002 = LOAD 'samplepig001.in' AS (id:chararray,num:int);
      f_records002= FILTER record002 BY num!=50000;
      group01 = GROUP f_records002 ALL PARALLEL 1;
      STORE group01 INTO 'pig_out_direc_SET1';
      
      
      set2 = FILTER f_records002 BY num!=200002;
      set2_Group = GROUP set2 ALL PARALLEL 1;
      STORE set2 INTO 'pig_out_direc_SET2';
      
      set3 = FILTER f_records002 BY num!=100001;
      set3_Group= GROUP set3 BY id PARALLEL 40;
      --set3_Rec4= FILTER set3_Group by num!=5000000;
      STORE set3_Group INTO 'pig_out_direc_SET3';
      

      When run in Pig 0.8 it results in the following output.

      $ hadoop fs -ls /user/viraj/pig_out_direc_SET1
      ...
      Found 40 items
      rw------- 3 viraj users 0 2010-11-13 02:09 /user/viraj/pig_out_direc_SET1/part-r-00000
      ...
      ...
      rw------ 3 viraj users 0 2010-11-13 02:09 /user/viraj/pig_out_direc_SET1/part-r-00039

      $ hadoop fs -ls /user/viraj/pig_out_direc_SET2
      Found 1 items
      rw------ 3 viraj users 110 2010-11-13 02:08 /user/viraj/pig_out_direc_SET2/part-m-00000

      $ hadoop fs -ls /user/viraj/pig_out_direc_SET3
      Found 40 items
      rw------ 3 viraj users 0 2010-11-13 02:09 /user/viraj/pig_out_direc_SET3/part-r-00000
      ...
      ...
      rw------ 3 viraj users 0 2010-11-13 02:09 /user/viraj/pig_out_direc_SET3/part-r-00039

      Viraj

      Attachments

        1. samplepig001.in
          0.1 kB
          Viraj Bhat

        Activity

          People

            rding Richard Ding
            viraj Viraj Bhat
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: