Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-16923 Hive-on-Spark DPP Improvements
  3. HIVE-16998

Add config to enable HoS DPP only for map-joins

    XMLWordPrintableJSON

Details

    Description

      HoS DPP will split a given operator tree in two under the following conditions: it has detected that the query can benefit from DPP, and the filter is not a map-join (see SplitOpTreeForDPP).

      This can hurt performance if the the non-partitioned side of the join involves a complex operator tree - e.g. the query select count from srcpart where srcpart.ds in (select max(srcpart.ds) from srcpart union all select min(srcpart.ds) from srcpart) will require running the subquery twice, once in each Spark job.

      Queries with map-joins don't get split into two operator trees and thus don't suffer from this drawback. Thus, it would be nice to have a config key that just enables DPP on HoS for map-joins.

      Attachments

        1. HIVE16998.1.patch
          12 kB
          Janaki Lahorani
        2. HIVE16998.2.patch
          42 kB
          Janaki Lahorani
        3. HIVE16998.3.patch
          44 kB
          Janaki Lahorani
        4. HIVE16998.4.patch
          48 kB
          Janaki Lahorani
        5. HIVE16998.5.patch
          48 kB
          Janaki Lahorani

        Activity

          People

            janulatha Janaki Lahorani
            stakiar Sahil Takiar
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: