Pig
  1. Pig
  2. PIG-3446 Umbrella jira for Pig on Tez
  3. PIG-3620

TezCompiler adds duplicate predecessors of blocking operators to TezPlan

    Details

    • Type: Sub-task Sub-task
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: tez-branch
    • Fix Version/s: tez-branch
    • Component/s: tez
    • Labels:
      None

      Description

      Here is a simplest example that reproduces the issue-

      test.pig
      a = LOAD 'foo' AS (x:int, y:chararray);
      b = GROUP a BY x;
      c = FOREACH b GENERATE a.x;
      STORE c INTO 'c';
      d = FOREACH b GENERATE a.y;
      STORE d INTO 'd';
      

      If you run pig -x tex_local -e 'explain -script test.pig', you will see two vertices that contains the following sub-plan-

      Tez vertex scope-27
      # Plan on vertex
      b: Local Rearrange[tuple]{int}(false) - scope-10
      |   |
      |   Project[int][0] - scope-11
      |
      |---a: New For Each(false,false)[bag] - scope-7
          |   |
          |   Cast[int] - scope-2
          |   |
          |   |---Project[bytearray][0] - scope-1
          |   |
          |   Cast[chararray] - scope-5
          |   |
          |   |---Project[bytearray][1] - scope-4
          |
          |---a: Load(file:///Users/cheolsoop/workspace/pig/foo:org.apache.pig.builtin.PigStorage) - scope-0
      

      What's happening is that since there are 2 stores (and thus 2 data flows, i.e. a=>c and a=>d), Pig generates two physical plans. Now TezCompile compiles them into a single tez plan but adds the same sub-plan twice.

      This is an issue with any blocking operators (join, union, etc) followed by split.

      1. PIG-3620-1.patch
        49 kB
        Rohini Palaniswamy

        Activity

        Rohini Palaniswamy made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Hide
        Rohini Palaniswamy added a comment -

        Committed to Tez branch. Thanks for the review Cheolsoo.

        Show
        Rohini Palaniswamy added a comment - Committed to Tez branch. Thanks for the review Cheolsoo.
        Hide
        Cheolsoo Park added a comment -

        +1.

        Show
        Cheolsoo Park added a comment - +1.
        Rohini Palaniswamy made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Rohini Palaniswamy made changes -
        Attachment PIG-3620-1.patch [ 12618823 ]
        Hide
        Rohini Palaniswamy added a comment -

        https://reviews.apache.org/r/16272/

        • Removed the duplicate operators in case of split
        • Fixed multiple levels of nested splits to work
        • Added some enhancements to plan printing for easy debugging
        • Print the connectivity between the vertices in a DAG
        • Print to which Tez vertex a POLocalRearrange connects to.
        • Changed TestTezCompiler to also include the combiner optimizer to verify the combiner plan as well.

        Testing:

        • Added tests to TestTezCompiler
        • Will add the e2e tests for Split with PIG-3626. MR multi-query is also broken now. Need to fix that as well for e2e to work.
        Show
        Rohini Palaniswamy added a comment - https://reviews.apache.org/r/16272/ Removed the duplicate operators in case of split Fixed multiple levels of nested splits to work Added some enhancements to plan printing for easy debugging Print the connectivity between the vertices in a DAG Print to which Tez vertex a POLocalRearrange connects to. Changed TestTezCompiler to also include the combiner optimizer to verify the combiner plan as well. Testing: Added tests to TestTezCompiler Will add the e2e tests for Split with PIG-3626 . MR multi-query is also broken now. Need to fix that as well for e2e to work.
        Rohini Palaniswamy made changes -
        Field Original Value New Value
        Assignee Rohini Palaniswamy [ rohini ]
        Cheolsoo Park created issue -

          People

          • Assignee:
            Rohini Palaniswamy
            Reporter:
            Cheolsoo Park
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development