Pig
  1. Pig
  2. PIG-1642

Order by doesn't use estimation to determine the parallelism

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.8.0
    • Fix Version/s: 0.8.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      With PIG-1249, a simple heuristic is used to determine the number of reducers if it isn't specified (via PARALLEL or default_parallel). For order by statement, however, it still defaults to 1.

      1. PIG-1642_1.patch
        14 kB
        Richard Ding
      2. PIG-1642_1.patch
        14 kB
        Richard Ding
      3. PIG-1642.patch
        12 kB
        Richard Ding

        Activity

        Hide
        Richard Ding added a comment -

        Patch committed to both trunk and 0.8 branch.

        Show
        Richard Ding added a comment - Patch committed to both trunk and 0.8 branch.
        Hide
        Thejas M Nair added a comment -

        Looks good. +1

        Show
        Thejas M Nair added a comment - Looks good. +1
        Hide
        Richard Ding added a comment -

        New patch to address the review comments.

        Show
        Richard Ding added a comment - New patch to address the review comments.
        Hide
        Thejas M Nair added a comment -

        Comments on the patch -

        • In SampleOptimizer.java It expects the sampling MR plan to have only one integer argument which has information about the number of reducers that will be used in the successor of sampling job (order-by/skewed-join). We might not remember this assumption if we make changes to the sampling plan, so it will be safer to throw an error if more than one integer constant is seen in the plan.
        • In test case, the expected number of reducers is being computed dynamically and used for checking in first scenario, it can be used it in last scenario as well.
        Show
        Thejas M Nair added a comment - Comments on the patch - In SampleOptimizer.java It expects the sampling MR plan to have only one integer argument which has information about the number of reducers that will be used in the successor of sampling job (order-by/skewed-join). We might not remember this assumption if we make changes to the sampling plan, so it will be safer to throw an error if more than one integer constant is seen in the plan. In test case, the expected number of reducers is being computed dynamically and used for checking in first scenario, it can be used it in last scenario as well.
        Hide
        Richard Ding added a comment -

        The patch passed test-core.

        The results of test-patch:

            [exec] +1 overall.  
             [exec] 
             [exec]     +1 @author.  The patch does not contain any @author tags.
             [exec] 
             [exec]     +1 tests included.  The patch appears to include 8 new or modified tests.
             [exec] 
             [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
             [exec] 
             [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
             [exec] 
             [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
             [exec] 
             [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
        
        Show
        Richard Ding added a comment - The patch passed test-core. The results of test-patch: [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 8 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.

          People

          • Assignee:
            Richard Ding
            Reporter:
            Richard Ding
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development