Details

    • Type: Sub-task Sub-task
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: tez-branch
    • Fix Version/s: tez-branch
    • Component/s: tez
    • Labels:
      None

      Description

      Currently, DISTINCT is implemented in a straightforward manner per https://issues.apache.org/jira/browse/PIG-3538.

      However, we can implement two types of combiner optimizations for DISTINCT, just as the MRCompiler does for map-reduce:
      1. A simple DistinctCombiner that throws away the duplicate tuples
      2. An optimizer that transforms certain uses of DISTINCT into an algebraic udf form

        Activity

        Hide
        Alex Bain added a comment -
        Show
        Alex Bain added a comment - ReviewBoard posted at https://reviews.apache.org/r/16717/
        Hide
        Cheolsoo Park added a comment -

        +1.

        Committed to tez branch. Thank you Alex!

        Show
        Cheolsoo Park added a comment - +1. Committed to tez branch. Thank you Alex!

          People

          • Assignee:
            Alex Bain
            Reporter:
            Alex Bain
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development