Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-16923 Hive-on-Spark DPP Improvements
  3. HIVE-16998

Add config to enable HoS DPP only for map-joins

    XMLWordPrintableJSON

    Details

    • Target Version/s:

      Description

      HoS DPP will split a given operator tree in two under the following conditions: it has detected that the query can benefit from DPP, and the filter is not a map-join (see SplitOpTreeForDPP).

      This can hurt performance if the the non-partitioned side of the join involves a complex operator tree - e.g. the query select count from srcpart where srcpart.ds in (select max(srcpart.ds) from srcpart union all select min(srcpart.ds) from srcpart) will require running the subquery twice, once in each Spark job.

      Queries with map-joins don't get split into two operator trees and thus don't suffer from this drawback. Thus, it would be nice to have a config key that just enables DPP on HoS for map-joins.

        Attachments

        1. HIVE16998.5.patch
          48 kB
          Janaki Lahorani
        2. HIVE16998.4.patch
          48 kB
          Janaki Lahorani
        3. HIVE16998.3.patch
          44 kB
          Janaki Lahorani
        4. HIVE16998.2.patch
          42 kB
          Janaki Lahorani
        5. HIVE16998.1.patch
          12 kB
          Janaki Lahorani

          Activity

            People

            • Assignee:
              janulatha Janaki Lahorani
              Reporter:
              stakiar Sahil Takiar
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: