Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-16923 Hive-on-Spark DPP Improvements
  3. HIVE-17396

Support DPP with map joins where the source and target belong in the same stage

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsAdd voteVotersWatch issueWatchersConvert to IssueMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Spark
    • None

    Description

      When the target of a partition pruning sink operator is in not the same as the target of hash table sink operator, both source and target gets scheduled within the same spark job, and that can result in File Not Found Exception. HIVE-17225 has a fix to disable DPP in that scenario. This JIRA is to support DPP for such cases.

      Test Case:
      SET hive.spark.dynamic.partition.pruning=true;
      SET hive.auto.convert.join=true;
      SET hive.strict.checks.cartesian.product=false;

      CREATE TABLE part_table1 (col int) PARTITIONED BY (part1_col int);
      CREATE TABLE part_table2 (col int) PARTITIONED BY (part2_col int);

      CREATE TABLE reg_table (col int);

      ALTER TABLE part_table1 ADD PARTITION (part1_col = 1);

      ALTER TABLE part_table2 ADD PARTITION (part2_col = 1);
      ALTER TABLE part_table2 ADD PARTITION (part2_col = 2);

      INSERT INTO TABLE part_table1 PARTITION (part1_col = 1) VALUES (1);

      INSERT INTO TABLE part_table2 PARTITION (part2_col = 1) VALUES (1);
      INSERT INTO TABLE part_table2 PARTITION (part2_col = 2) VALUES (2);

      INSERT INTO table reg_table VALUES (1), (2), (3), (4), (5), (6);

      EXPLAIN SELECT *
      FROM part_table1 pt1,
      part_table2 pt2,
      reg_table rt
      WHERE rt.col = pt1.part1_col
      AND pt2.part2_col = pt1.part1_col;

      Plan:
      STAGE DEPENDENCIES:
      Stage-2 is a root stage
      Stage-1 depends on stages: Stage-2
      Stage-0 depends on stages: Stage-1

      STAGE PLANS:
      Stage: Stage-2
      Spark

            1. A masked pattern was here ####
              Vertices:
              Map 1
              Map Operator Tree:
              TableScan
              alias: pt1
              Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: NONE
              Select Operator
              expressions: col (type: int), part1_col (type: int)
              outputColumnNames: _col0, _col1
              Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: NONE
              Spark HashTable Sink Operator
              keys:
              0 _col1 (type: int)
              1 _col1 (type: int)
              2 _col0 (type: int)
              Select Operator
              expressions: _col1 (type: int)
              outputColumnNames: _col0
              Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: NONE
              Group By Operator
              keys: _col0 (type: int)
              mode: hash
              outputColumnNames: _col0
              Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: NONE
              Spark Partition Pruning Sink Operator
              Target column: part2_col (int)
              partition key expr: part2_col
              Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: NONE
              target work: Map 2
              Local Work:
              Map Reduce Local Work
              Map 2
              Map Operator Tree:
              TableScan
              alias: pt2
              Statistics: Num rows: 2 Data size: 2 Basic stats: COMPLETE Column stats: NONE
              Select Operator
              expressions: col (type: int), part2_col (type: int)
              outputColumnNames: _col0, _col1
              Statistics: Num rows: 2 Data size: 2 Basic stats: COMPLETE Column stats: NONE
              Spark HashTable Sink Operator
              keys:
              0 _col1 (type: int)
              1 _col1 (type: int)
              2 _col0 (type: int)
              Local Work:
              Map Reduce Local Work

      Stage: Stage-1
      Spark

            1. A masked pattern was here ####
              Vertices:
              Map 3
              Map Operator Tree:
              TableScan
              alias: rt
              Statistics: Num rows: 6 Data size: 6 Basic stats: COMPLETE Column stats: NONE
              Filter Operator
              predicate: col is not null (type: boolean)
              Statistics: Num rows: 6 Data size: 6 Basic stats: COMPLETE Column stats: NONE
              Select Operator
              expressions: col (type: int)
              outputColumnNames: _col0
              Statistics: Num rows: 6 Data size: 6 Basic stats: COMPLETE Column stats: NONE
              Map Join Operator
              condition map:
              Inner Join 0 to 1
              Inner Join 0 to 2
              keys:
              0 _col1 (type: int)
              1 _col1 (type: int)
              2 _col0 (type: int)
              outputColumnNames: _col0, _col1, _col2, _col3, _col4
              input vertices:
              0 Map 1
              1 Map 2
              Statistics: Num rows: 13 Data size: 13 Basic stats: COMPLETE Column stats: NONE
              File Output Operator
              compressed: false
              Statistics: Num rows: 13 Data size: 13 Basic stats: COMPLETE Column stats: NONE
              table:
              input format: org.apache.hadoop.mapred.SequenceFileInputFormat
              output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
              serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
              Local Work:
              Map Reduce Local Work

      Stage: Stage-0
      Fetch Operator
      limit: -1
      Processor Tree:
      ListSink

      Attachments

        1. HIVE-17396.1.patch
          96 kB
          Janaki Lahorani
        2. HIVE-17396.2.patch
          65 kB
          Janaki Lahorani
        3. HIVE-17396.3.patch
          65 kB
          Janaki Lahorani
        4. HIVE-17396.4.patch
          65 kB
          Janaki Lahorani
        5. HIVE-17396.5.patch
          65 kB
          Janaki Lahorani
        6. HIVE-17396.6.patch
          66 kB
          Janaki Lahorani
        7. HIVE-17396.7.patch
          65 kB
          Janaki Lahorani
        8. HIVE-17396.8.patch
          65 kB
          Janaki Lahorani
        9. HIVE-17396.9.patch
          65 kB
          Janaki Lahorani

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            janulatha Janaki Lahorani Assign to me
            janulatha Janaki Lahorani

            Dates

              Created:
              Updated:

              Slack

                Issue deployment