Details
-
Sub-task
-
Status: In Progress
-
Major
-
Resolution: Unresolved
-
3.2.0
-
None
-
None
Description
For the current version, partition pruning support is limited to the scene.
Let's look at the implementation of the source code:
https://github.com/apache/spark/blob/031c5ef280e0cba8c4718a6457a44b6cccb17f46/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala#L840
Hive getPartitionsByFilter() takes a string that represents partition predicates like "str_key=\"value\" and int_key=1 ...", but for normal functions like concat/concat_ws/substr,it does not support.
The defect can cause a large number of partitions to be scanned which will increase the amount of data involved in the calculation and increase the pressure of service of metastore.