Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-16923

Hive-on-Spark DPP Improvements

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Spark
    • None

    Description

      Improvements to Hive-on-Spark DPP so that it is production ready.

      Hive-on-Spark DPP was implemented in HIVE-9152. However, it is disabled by default. The goal of this JIRA is to improve the DPP implementation so that it can be enabled by default.

      Attachments

        Issue Links

          1.
          Add config to enable HoS DPP only for map-joins Sub-task Closed Janaki Lahorani
          2.
          Remove unnecessary HoS DPP trees during map-join conversion Sub-task Closed Sahil Takiar
          3.
          Spark Partition Pruning Sink Operator can't target multiple Works Sub-task Closed Rui Li
          4.
          Additional qtests for HoS DPP Sub-task Closed Sahil Takiar
          5.
          NPE in SparkPartitionPruningSinkOperator#closeOp for query with partitioned join in subquery Sub-task Open Sahil Takiar
          6.
          HoS DPP pruning sink ops can target parallel work objects Sub-task Closed Sahil Takiar
          7.
          DPP isn't trigger for partitioned to partitioned join within a subquery Sub-task Open Janaki Lahorani
          8.
          HoS DPP: UDFs on the partition column side does not evaluate correctly Sub-task Closed Sahil Takiar
          9.
          Support DPP with map joins where the source and target belong in the same stage Sub-task Patch Available Janaki Lahorani
          10.
          Support Costing/Heuristics to enable or disable DPP Sub-task Open Janaki Lahorani
          11.
          HoS DPP ConstantPropagate should use ConstantPropagateOption.SHORTCUT Sub-task Closed Sahil Takiar
          12.
          HoS doesn't trigger mapjoins against subquery with union all Sub-task Open Janaki Lahorani
          13.
          HoS DPP + Vectorization generates invalid explain plan due to CombineEquivalentWorkResolver Sub-task Closed liyunzhang
          14.
          DynamicPartitionPruningOptimization doesn't log what filter triggered DPP Sub-task Open Unassigned
          15.
          SparkDynamicPartitionPruner loads all partition metadata into memory Sub-task Open Janaki Lahorani
          16.
          spark_dynamic_partition_pruning.q fails when hive.tez.dynamic.semijoin.reduction is false Sub-task Open Sahil Takiar
          17.
          SparkPartitionPruningSinkOperator buffers all writes in memory Sub-task Open Janaki Lahorani
          18.
          SparkPartitionPruner shouldn't be triggered by Spark tasks Sub-task Resolved Sahil Takiar
          19.
          DPP call to remove PartitionDescs from aliasToPartnInfo doesn't do anything Sub-task Open Unassigned
          20.
          Set hive.spark.dynamic.partition.pruning.map.join.only to true by default Sub-task Open Unassigned

          Activity

            People

              stakiar Sahil Takiar
              stakiar Sahil Takiar
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated: