Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
Hive tries to set number of reducers equal to number of buckets here:
https://github.com/apache/hive/blob/9857c4e584384f7b0a49c34bc2bdf876c2ea1503/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L6958
The numberOfBuckets for implicitly bucketed tables is set to -1 by default. When this is the case, it is left to hive to estimate the number of reducers required the job, based on job input, and configuration parameters.
This estimate is not optimal in all cases. In the worst case, it case result in a single reducer being launched , which can lead to a significant bottleneck in performance .
Ideally, the number of reducers launched should equal to number of buckets, which is the case for explicitly bucketed tables.
Attachments
Issue Links
- relates to
-
HIVE-25611 OOM when running MERGE query on wide transactional table with many buckets
- Open
- links to