Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-33537 Hive Metastore filter pushdown improvement
  3. SPARK-33707

Support multiple types of function partition pruning on hive metastore

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: In Progress
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 3.2.0
    • Fix Version/s: None
    • Component/s: SQL
    • Labels:
      None

      Description

      For the current version, partition pruning support is limited to the scene.

      Let's look at the implementation of the source code:
      https://github.com/apache/spark/blob/031c5ef280e0cba8c4718a6457a44b6cccb17f46/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala#L840

      Hive getPartitionsByFilter() takes a string that represents partition predicates like "str_key=\"value\" and int_key=1 ...", but for normal functions like concat/concat_ws/substr,it  does not support.

      The defect can cause a large number of partitions to be scanned which will increase the amount of data involved in the calculation and increase the pressure of service of metastore.

       

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              southernriver chenliang
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated: