Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
1.6.1
-
None
-
None
Description
It would be very useful to allow the disabling of this block of code within DynamicPartitionWriterContainer#writeRows at runtime:
The use case is that an upstream groupBy has already sorted a great many fine grained groups which are the target of the partitionBy. This partitionBy shares the same keys as the groupBy. Currently, we can't even get Spark to succeed due to the sort step and data skew in the partitions. In general, this would make more efficient use of cluster resources.
For this to work, there needs to be a way to communicate between the groupBy and the partitionBy by way of some runtime configuration. This is very similar in function to Hive's hive.optimize.sort.dynamic.partition parameter.
Attachments
Issue Links
- duplicates
-
SPARK-19563 advoid unnecessary sort in FileFormatWriter
- Resolved
- is related to
-
HIVE-6455 Scalable dynamic partitioning and bucketing optimization
- Closed