Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
When the hive.optimize.sort.dynamic.partition.threshold param is set to 0, the optimizer decides whether to create the SortedDynPartitionOptimizer class.
In production, we've seen this making the wrong decision when there is a simple INSERT..SELECT into a partitioned table and the data being inserted is skewed towards one partition.
In this case, it still is creating the SortedDynPartitionOptimizer. This forces a reducer step and all the data gets sent to the same reducer.
In order to reproduce this, you may also have to turn off "autogather" stats since this also will create a reducer step.
What we ultimately want is just a mapper step so the load is evenly distributed across the mappers.