Pig
  1. Pig
  2. PIG-3385

DISTINCT no longer uses custom partitioner

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.12.0, 0.11.2
    • Component/s: impl
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      From user@pig.apache.org: It looks like an optimization was put in to make distinct use a special partitioner which prevents the user from setting the partitioner.

      1. pig-3385-v02.patch
        10 kB
        Koji Noguchi
      2. pig-3385-v01.patch
        3 kB
        Koji Noguchi

        Activity

        Hide
        Daniel Dai added a comment -

        Committed to 0.11 branch.

        Show
        Daniel Dai added a comment - Committed to 0.11 branch.
        Hide
        Koji Noguchi added a comment -

        Thanks Daniel! Can we back-port this patch and PIG-3435 to 0.11? Without them, custom partitioner is almost unusable.

        Show
        Koji Noguchi added a comment - Thanks Daniel! Can we back-port this patch and PIG-3435 to 0.11? Without them, custom partitioner is almost unusable.
        Hide
        Daniel Dai added a comment -

        Patch committed to trunk. Thanks Koji!

        Show
        Daniel Dai added a comment - Patch committed to trunk. Thanks Koji!
        Hide
        Daniel Dai added a comment -

        +1. Verified distinct does not work with custom partition even in early releases.

        Show
        Daniel Dai added a comment - +1. Verified distinct does not work with custom partition even in early releases.
        Hide
        Koji Noguchi added a comment -

        While looking at this jira, noticed custom partitioner being dropped when run with multi query optimization. Created PIG-3435.

        Show
        Koji Noguchi added a comment - While looking at this jira, noticed custom partitioner being dropped when run with multi query optimization. Created PIG-3435 .
        Hide
        Koji Noguchi added a comment -

        Uploading a patch with test. Noticed that original test for custom partitioners didn't give different partition results than the default so added one silly partitioner that always return 1 (second reducer).

        Show
        Koji Noguchi added a comment - Uploading a patch with test. Noticed that original test for custom partitioners didn't give different partition results than the default so added one silly partitioner that always return 1 (second reducer).
        Hide
        Koji Noguchi added a comment -

        Wondering if custom partitioner ever worked for distinct.

        Looks like partitioner info is passed through POGlobalRearrange but "distinct" doesn't use it.

        Uploading an initial patch that just passes that info through PODistinct.

        It's the first time for me to touch the backend code. Appreciate if someone can take a look. I'll upload a testcase next.

        Show
        Koji Noguchi added a comment - Wondering if custom partitioner ever worked for distinct. Looks like partitioner info is passed through POGlobalRearrange but "distinct" doesn't use it. Uploading an initial patch that just passes that info through PODistinct. It's the first time for me to touch the backend code. Appreciate if someone can take a look. I'll upload a testcase next.
        Hide
        Siegfried Bilstein added a comment -
        Show
        Siegfried Bilstein added a comment - here is the SO question I made documenting the issue: http://stackoverflow.com/questions/17554593/custom-partitioner-in-hadoop/17747335?noredirect=1#17747335
        Hide
        Siegfried Bilstein added a comment -

        I observed this issue as well with DISTINCT clauses.

        Show
        Siegfried Bilstein added a comment - I observed this issue as well with DISTINCT clauses.

          People

          • Assignee:
            Koji Noguchi
            Reporter:
            Will Oberman
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development