Hive
  1. Hive
  2. HIVE-3640

Reducer allocation is incorrect if enforce bucketing and mapred.reduce.tasks are both set

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.10.0
    • Fix Version/s: 0.10.0
    • Component/s: Query Processor
    • Labels:
      None

      Description

      When I enforce bucketing and fix the number of reducers via mapred.reduce.tasks Hive ignores my input and instead takes the largest value <= hive.exec.reducers.max that is also an even divisor of num_buckets. In other words, if I set 1024 buckets and set mapred.reduce.tasks=1024 I'll get. . . 256 reducers. If I set 1997 buckets and set mapred.reduce.tasks=1997 I'll get. . . 1 reducer.

      This is totally crazy, and it's far, far crazier when the data inputs get large. In the latter case the bucketing job will almost certainly fail because we'll most likely try to stuff several TB of input through a single reducer. We'll also drastically reduce the effectiveness of bucketing, since the buckets themselves will be larger.

      If the user sets mapred.reduce.tasks in a query that inserts into a bucketed table we should either accept that value or raise an exception if it's invalid relative to the number of buckets. We should absolutely NOT override the user's direction and fall back on automatically allocating reducers based on some obscure logic dictated by completely different setting.

      I have yet to encounter a single person who expected this the first time, so it's clearly a bug.

      1. HIVE-3640.1.patch.txt
        5 kB
        Vighnesh Avadhani

        Activity

        Hide
        Vighnesh Avadhani added a comment -

        Fixd bug and added corresponding unit test

        Show
        Vighnesh Avadhani added a comment - Fixd bug and added corresponding unit test
        Hide
        Namit Jain added a comment -

        Can you create a review request using
        https://cwiki.apache.org/Hive/phabricatorcodereview.html ?

        Show
        Namit Jain added a comment - Can you create a review request using https://cwiki.apache.org/Hive/phabricatorcodereview.html ?
        Hide
        Vighnesh Avadhani added a comment -

        Done: https://reviews.facebook.net/D6327
        I could not use arc diff --jira HIVE-3640 as it was throwing:
        PHP Fatal error: Call to undefined method ArcanistGitAPI::amendGitHeadCommit() in /Users/vighnesh/hive/.arc_jira_lib/arcanist/ArcJIRAConfiguration.php on line 173

        Show
        Vighnesh Avadhani added a comment - Done: https://reviews.facebook.net/D6327 I could not use arc diff --jira HIVE-3640 as it was throwing: PHP Fatal error: Call to undefined method ArcanistGitAPI::amendGitHeadCommit() in /Users/vighnesh/hive/.arc_jira_lib/arcanist/ArcJIRAConfiguration.php on line 173
        Hide
        Kevin Wilfong added a comment -

        Couple minor comments on the diff.

        Could you also hit the "Submit Patch" button to mark the JIRA "Patch Available" so reviewers know to review the diff.

        Show
        Kevin Wilfong added a comment - Couple minor comments on the diff. Could you also hit the "Submit Patch" button to mark the JIRA "Patch Available" so reviewers know to review the diff.
        Hide
        Kevin Wilfong added a comment -

        +1

        Show
        Kevin Wilfong added a comment - +1
        Hide
        Kevin Wilfong added a comment -

        Committed, thanks Vighnesh.

        Show
        Kevin Wilfong added a comment - Committed, thanks Vighnesh.
        Hide
        Vighnesh Avadhani added a comment -

        Thanks for committing Kevin.

        Show
        Vighnesh Avadhani added a comment - Thanks for committing Kevin.
        Hide
        Hudson added a comment -

        Integrated in Hive-trunk-h0.21 #1771 (See https://builds.apache.org/job/Hive-trunk-h0.21/1771/)
        HIVE-3640. Reducer allocation is incorrect if enforce bucketing and mapred.reduce.tasks are both set. (Vighnesh Avadhani via kevinwilfong) (Revision 1405240)

        Result = FAILURE
        kevinwilfong : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1405240
        Files :

        • /hive/trunk/build-common.xml
        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
        • /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/hooks/VerifyNumReducersForBucketsHook.java
        • /hive/trunk/ql/src/test/queries/clientpositive/bucket_num_reducers.q
        • /hive/trunk/ql/src/test/results/clientpositive/bucket_num_reducers.q.out
        Show
        Hudson added a comment - Integrated in Hive-trunk-h0.21 #1771 (See https://builds.apache.org/job/Hive-trunk-h0.21/1771/ ) HIVE-3640 . Reducer allocation is incorrect if enforce bucketing and mapred.reduce.tasks are both set. (Vighnesh Avadhani via kevinwilfong) (Revision 1405240) Result = FAILURE kevinwilfong : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1405240 Files : /hive/trunk/build-common.xml /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/hooks/VerifyNumReducersForBucketsHook.java /hive/trunk/ql/src/test/queries/clientpositive/bucket_num_reducers.q /hive/trunk/ql/src/test/results/clientpositive/bucket_num_reducers.q.out
        Hide
        Hudson added a comment -

        Integrated in Hive-trunk-hadoop2 #54 (See https://builds.apache.org/job/Hive-trunk-hadoop2/54/)
        HIVE-3640. Reducer allocation is incorrect if enforce bucketing and mapred.reduce.tasks are both set. (Vighnesh Avadhani via kevinwilfong) (Revision 1405240)

        Result = ABORTED
        kevinwilfong : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1405240
        Files :

        • /hive/trunk/build-common.xml
        • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
        • /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/hooks/VerifyNumReducersForBucketsHook.java
        • /hive/trunk/ql/src/test/queries/clientpositive/bucket_num_reducers.q
        • /hive/trunk/ql/src/test/results/clientpositive/bucket_num_reducers.q.out
        Show
        Hudson added a comment - Integrated in Hive-trunk-hadoop2 #54 (See https://builds.apache.org/job/Hive-trunk-hadoop2/54/ ) HIVE-3640 . Reducer allocation is incorrect if enforce bucketing and mapred.reduce.tasks are both set. (Vighnesh Avadhani via kevinwilfong) (Revision 1405240) Result = ABORTED kevinwilfong : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1405240 Files : /hive/trunk/build-common.xml /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/hooks/VerifyNumReducersForBucketsHook.java /hive/trunk/ql/src/test/queries/clientpositive/bucket_num_reducers.q /hive/trunk/ql/src/test/results/clientpositive/bucket_num_reducers.q.out
        Hide
        Ashutosh Chauhan added a comment -

        This issue is fixed and released as part of 0.10.0 release. If you find an issue which seems to be related to this one, please create a new jira and link this one with new jira.

        Show
        Ashutosh Chauhan added a comment - This issue is fixed and released as part of 0.10.0 release. If you find an issue which seems to be related to this one, please create a new jira and link this one with new jira.

          People

          • Assignee:
            Vighnesh Avadhani
            Reporter:
            Vighnesh Avadhani
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 48h
              48h
              Remaining:
              Remaining Estimate - 48h
              48h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Development