Details

    • Type: Sub-task Sub-task
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: tez-branch
    • Fix Version/s: tez-branch
    • Component/s: tez
    • Labels:
      None

      Description

      To support algebraic UDFs and others, combiner is required. To start with, I am proposing the following initial implementation-

      • In Tez, combiner runs as part of ShuffledMergedInput in edges, so multiple combine plans (one per edge) need to be registered in a destination vertex. Each vertex is mapped to a TezOperator in Tez plan, so an array of combine plans will be stored in the TezOperator that maps to a destination vertex.
      • To register combine plans in a TezOperator, we will run a CombinerOptimizer on the Tez plan after TezCompiler generates it but before TezDagBuilder converts it into DAG.
      • Finally, TezDagBuilder will insert combine plans into the payload of ShuffledMergedInput while constructing a destination vertex.

      This initial implementation will allow us to run algebraic UDFs. In the future, we can implement more optimizations for limit, order-by, etc on top of this.

      1. PIG-3555-1.patch
        114 kB
        Cheolsoo Park
      2. PIG-3555-2.patch
        115 kB
        Cheolsoo Park
      3. PIG-3555-3.patch
        126 kB
        Cheolsoo Park
      4. PIG-3555-4.patch
        127 kB
        Cheolsoo Park

        Activity

        Hide
        Cheolsoo Park added a comment -

        Committed to tez branch. Thank you Mark for the review!

        Show
        Cheolsoo Park added a comment - Committed to tez branch. Thank you Mark for the review!
        Hide
        Cheolsoo Park added a comment -

        There was a mistake in the previous patch. Re-attaching a good one.

        Show
        Cheolsoo Park added a comment - There was a mistake in the previous patch. Re-attaching a good one.
        Hide
        Cheolsoo Park added a comment -

        Uploading a new patch. TestCombiner passes in tez mode, and I added it to tez-test (ant test-tez).

        Show
        Cheolsoo Park added a comment - Uploading a new patch. TestCombiner passes in tez mode, and I added it to tez-test (ant test-tez).
        Hide
        Cheolsoo Park added a comment -

        Uploading a new patch.

        Show
        Cheolsoo Park added a comment - Uploading a new patch.
        Hide
        Cheolsoo Park added a comment -

        Attaching the first patch. RB link- https://reviews.apache.org/r/15261/

        Show
        Cheolsoo Park added a comment - Attaching the first patch. RB link- https://reviews.apache.org/r/15261/
        Hide
        Cheolsoo Park added a comment -

        Mark Wagner, thank you for the comment. TezEdge sounds like a good idea.

        Show
        Cheolsoo Park added a comment - Mark Wagner , thank you for the comment. TezEdge sounds like a good idea.
        Hide
        Mark Wagner added a comment -

        I think that's a good way to do it. One comment: Tez also does combiners as part of OnFileSortedOutput (like the traditional mapred combiners). I'd propose we create a new "TezEdge" to serve as a descriptor for edges, since this is likely an area where we'll be doing a lot of optimization in the future w/ Tez (Streaming edges, Shuffles with no sorting, etc.) and it would be good to have some separation from TezOp. Then every TezOperator can maintain knowledge of it's input and output TezEdges.

        Show
        Mark Wagner added a comment - I think that's a good way to do it. One comment: Tez also does combiners as part of OnFileSortedOutput (like the traditional mapred combiners). I'd propose we create a new "TezEdge" to serve as a descriptor for edges, since this is likely an area where we'll be doing a lot of optimization in the future w/ Tez (Streaming edges, Shuffles with no sorting, etc.) and it would be good to have some separation from TezOp. Then every TezOperator can maintain knowledge of it's input and output TezEdges.

          People

          • Assignee:
            Cheolsoo Park
            Reporter:
            Cheolsoo Park
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development