Pig
  1. Pig
  2. PIG-3446 Umbrella jira for Pig on Tez
  3. PIG-3620

TezCompiler adds duplicate predecessors of blocking operators to TezPlan

    Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: tez-branch
    • Fix Version/s: tez-branch
    • Component/s: tez
    • Labels:
      None

      Description

      Here is a simplest example that reproduces the issue-

      test.pig
      a = LOAD 'foo' AS (x:int, y:chararray);
      b = GROUP a BY x;
      c = FOREACH b GENERATE a.x;
      STORE c INTO 'c';
      d = FOREACH b GENERATE a.y;
      STORE d INTO 'd';
      

      If you run pig -x tez_local -e 'explain -script test.pig', you will see two vertices that contains the following sub-plan-

      Tez vertex scope-27
      # Plan on vertex
      b: Local Rearrange[tuple]{int}(false) - scope-10
      |   |
      |   Project[int][0] - scope-11
      |
      |---a: New For Each(false,false)[bag] - scope-7
          |   |
          |   Cast[int] - scope-2
          |   |
          |   |---Project[bytearray][0] - scope-1
          |   |
          |   Cast[chararray] - scope-5
          |   |
          |   |---Project[bytearray][1] - scope-4
          |
          |---a: Load(file:///Users/cheolsoop/workspace/pig/foo:org.apache.pig.builtin.PigStorage) - scope-0
      

      What's happening is that since there are 2 stores (and thus 2 data flows, i.e. a=>c and a=>d), Pig generates two physical plans. Now TezCompile compiles them into a single tez plan but adds the same sub-plan twice.

      This is an issue with any blocking operators (join, union, etc) followed by split.

      1. PIG-3620-1.patch
        49 kB
        Rohini Palaniswamy

        Activity

        No work has yet been logged on this issue.

          People

          • Assignee:
            Rohini Palaniswamy
            Reporter:
            Cheolsoo Park
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development