Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-1672

order of relations in replicated join gets switched in a query where first relation has two mergeable foreach statements

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.8.0
    • 0.8.0
    • None
    • None
    • Reviewed

    Description

      The replicated join query was running out of memory because the order of relations got switched during logical plan optimization and it was attempting to load the larger (left) relation into memory.

      cat replj.pig
      l1 = load 'x' as (a);
      l2 = load 'y' as (b);
      l3 = load 'z' as (a1,b1,c1,d1);
      f1 = foreach l3 generate a1 as a, b1 as b, c1 as c, d1 as d;
      f2 = foreach f1 generate a,b,c; 
      j1 = join f2 by a, l1 by a using 'replicated';
      j2 = join j1 by b, l2 by b using 'replicated';
      explain j2;
      
      
      Note that in the MR plan printed below, the Load in the MR job with join operations has 'x' as the input instead of 'z' .
      
      #--------------------------------------------------
      # Map Reduce Plan                                  
      #--------------------------------------------------
      MapReduce node scope-30
      Map Plan
      Store(file:/tmp/temp101387354/tmp-125684214:org.apache.pig.impl.io.InterStorage) - scope-31
      |
      |---l2: Load(file:///Users/tejas/pig-0.8/branch-0.8/y:org.apache.pig.builtin.PigStorage) - scope-17--------
      Global sort: false
      ----------------
      
      MapReduce node scope-27
      Map Plan
      j2: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-26
      |
      |---j2: FRJoin[tuple] - scope-20
          |   |
          |   Project[bytearray][1] - scope-18
          |   |
          |   Project[bytearray][0] - scope-19
          |
          |---j1: FRJoin[tuple] - scope-11
              |   |
              |   Project[bytearray][0] - scope-9
              |   |
              |   Project[bytearray][0] - scope-10
              |
              |---l1: Load(file:///Users/tejas/pig-0.8/branch-0.8/x:org.apache.pig.builtin.PigStorage) - scope-0--------
      Global sort: false
      ----------------
      
      MapReduce node scope-28
      Map Plan
      Store(file:/tmp/temp101387354/tmp-890864787:org.apache.pig.impl.io.InterStorage) - scope-29
      |
      |---f2: New For Each(false,false,false)[bag] - scope-8
          |   |
          |   Project[bytearray][0] - scope-2
          |   |
          |   Project[bytearray][1] - scope-4
          |   |
          |   Project[bytearray][2] - scope-6
          |
          |---l3: Load(file:///Users/tejas/pig-0.8/branch-0.8/z:org.apache.pig.builtin.PigStorage) - scope-1--------
      Global sort: false
      ----------------
      
      

      Attachments

        1. PIG-1672.1.patch
          2 kB
          Thejas Nair
        2. PIG-1672.2.patch
          8 kB
          Thejas Nair

        Activity

          People

            thejas Thejas Nair
            thejas Thejas Nair
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: