Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-16923 Hive-on-Spark DPP Improvements
  3. HIVE-16998

Add config to enable HoS DPP only for map-joins

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersConvert to IssueMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      HoS DPP will split a given operator tree in two under the following conditions: it has detected that the query can benefit from DPP, and the filter is not a map-join (see SplitOpTreeForDPP).

      This can hurt performance if the the non-partitioned side of the join involves a complex operator tree - e.g. the query select count from srcpart where srcpart.ds in (select max(srcpart.ds) from srcpart union all select min(srcpart.ds) from srcpart) will require running the subquery twice, once in each Spark job.

      Queries with map-joins don't get split into two operator trees and thus don't suffer from this drawback. Thus, it would be nice to have a config key that just enables DPP on HoS for map-joins.

      Attachments

        1. HIVE16998.1.patch
          12 kB
          Janaki Lahorani
        2. HIVE16998.2.patch
          42 kB
          Janaki Lahorani
        3. HIVE16998.3.patch
          44 kB
          Janaki Lahorani
        4. HIVE16998.4.patch
          48 kB
          Janaki Lahorani
        5. HIVE16998.5.patch
          48 kB
          Janaki Lahorani

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            janulatha Janaki Lahorani Assign to me
            stakiar Sahil Takiar
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment