Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-8699 Enable support for common map join [Spark Branch]
  3. HIVE-8842

auto_join2.q produces incorrect tree [Spark Branch]

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.1.0
    • Spark
    • None

    Description

      Enabling the SparkMapJoinResolver and SparkReduceSinkMapJoinProc, I see the following:

      explain select * from src src1 JOIN src src2 ON (src1.key = src2.key) JOIN src src3 ON (src1.key + src2.key = src3.key);
      

      produces too many stages (six), and too many HashTableSink.

      STAGE DEPENDENCIES:
        Stage-5 is a root stage
        Stage-4 depends on stages: Stage-5
        Stage-3 depends on stages: Stage-4
        Stage-7 is a root stage
        Stage-6 depends on stages: Stage-7
        Stage-0 is a root stage
      
      STAGE PLANS:
        Stage: Stage-5
          Spark
            DagName: szehon_20141112105656_dd50e07d-94ad-4f9d-899e-bcb6d9a39c13:3
            Vertices:
              Map 1 
                  Map Operator Tree:
                      TableScan
                        alias: src2
                        Statistics: Num rows: 29 Data size: 5812 Basic stats: COMPLETE Column stats: NONE
                        Filter Operator
                          predicate: key is not null (type: boolean)
                          Statistics: Num rows: 15 Data size: 3006 Basic stats: COMPLETE Column stats: NONE
                          HashTable Sink Operator
                            condition expressions:
                              0 {key} {value}
                              1 {key} {value}
                            keys:
                              0 key (type: string)
                              1 key (type: string)
      
        Stage: Stage-4
          Spark
            DagName: szehon_20141112105656_dd50e07d-94ad-4f9d-899e-bcb6d9a39c13:2
            Vertices:
              Map 3 
                  Map Operator Tree:
                      TableScan
                        alias: src1
                        Statistics: Num rows: 29 Data size: 5812 Basic stats: COMPLETE Column stats: NONE
                        Filter Operator
                          predicate: key is not null (type: boolean)
                          Statistics: Num rows: 15 Data size: 3006 Basic stats: COMPLETE Column stats: NONE
                          Map Join Operator
                            condition map:
                                 Inner Join 0 to 1
                            condition expressions:
                              0 {key} {value}
                              1 {key} {value}
                            keys:
                              0 key (type: string)
                              1 key (type: string)
                            outputColumnNames: _col0, _col1, _col5, _col6
                            input vertices:
                              1 Map 1
                            Statistics: Num rows: 16 Data size: 3306 Basic stats: COMPLETE Column stats: NONE
                            Filter Operator
                              predicate: (_col0 + _col5) is not null (type: boolean)
                              Statistics: Num rows: 8 Data size: 1653 Basic stats: COMPLETE Column stats: NONE
                              HashTable Sink Operator
                                condition expressions:
                                  0 {_col0} {_col1} {_col5} {_col6}
                                  1 {key} {value}
                                keys:
                                  0 (_col0 + _col5) (type: double)
                                  1 UDFToDouble(key) (type: double)
      
        Stage: Stage-3
          Spark
            DagName: szehon_20141112105656_dd50e07d-94ad-4f9d-899e-bcb6d9a39c13:1
            Vertices:
              Map 2 
                  Map Operator Tree:
                      TableScan
                        alias: src3
                        Statistics: Num rows: 29 Data size: 5812 Basic stats: COMPLETE Column stats: NONE
                        Filter Operator
                          predicate: UDFToDouble(key) is not null (type: boolean)
                          Statistics: Num rows: 15 Data size: 3006 Basic stats: COMPLETE Column stats: NONE
                          Map Join Operator
                            condition map:
                                 Inner Join 0 to 1
                            condition expressions:
                              0 {_col0} {_col1} {_col5} {_col6}
                              1 {key} {value}
                            keys:
                              0 (_col0 + _col5) (type: double)
                              1 UDFToDouble(key) (type: double)
                            outputColumnNames: _col0, _col1, _col5, _col6, _col10, _col11
                            input vertices:
                              0 Map 3
                            Statistics: Num rows: 16 Data size: 3306 Basic stats: COMPLETE Column stats: NONE
                            Select Operator
                              expressions: _col0 (type: string), _col1 (type: string), _col5 (type: string), _col6 (type: string), _col10 (type: string), _col11 (type: string)
                              outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5
                              Statistics: Num rows: 16 Data size: 3306 Basic stats: COMPLETE Column stats: NONE
                              File Output Operator
                                compressed: false
                                Statistics: Num rows: 16 Data size: 3306 Basic stats: COMPLETE Column stats: NONE
                                table:
                                    input format: org.apache.hadoop.mapred.TextInputFormat
                                    output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                                    serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      
        Stage: Stage-7
          Spark
            DagName: szehon_20141112105656_dd50e07d-94ad-4f9d-899e-bcb6d9a39c13:3
            Vertices:
              Map 1 
                  Map Operator Tree:
                      TableScan
                        alias: src2
                        Statistics: Num rows: 29 Data size: 5812 Basic stats: COMPLETE Column stats: NONE
                        Filter Operator
                          predicate: key is not null (type: boolean)
                          Statistics: Num rows: 15 Data size: 3006 Basic stats: COMPLETE Column stats: NONE
                          HashTable Sink Operator
                            condition expressions:
                              0 {key} {value}
                              1 {key} {value}
                            keys:
                              0 key (type: string)
                              1 key (type: string)
      
        Stage: Stage-6
          Spark
            DagName: szehon_20141112105656_dd50e07d-94ad-4f9d-899e-bcb6d9a39c13:2
            Vertices:
              Map 3 
                  Map Operator Tree:
                      TableScan
                        alias: src1
                        Statistics: Num rows: 29 Data size: 5812 Basic stats: COMPLETE Column stats: NONE
                        Filter Operator
                          predicate: key is not null (type: boolean)
                          Statistics: Num rows: 15 Data size: 3006 Basic stats: COMPLETE Column stats: NONE
                          Map Join Operator
                            condition map:
                                 Inner Join 0 to 1
                            condition expressions:
                              0 {key} {value}
                              1 {key} {value}
                            keys:
                              0 key (type: string)
                              1 key (type: string)
                            outputColumnNames: _col0, _col1, _col5, _col6
                            input vertices:
                              1 Map 1
                            Statistics: Num rows: 16 Data size: 3306 Basic stats: COMPLETE Column stats: NONE
                            Filter Operator
                              predicate: (_col0 + _col5) is not null (type: boolean)
                              Statistics: Num rows: 8 Data size: 1653 Basic stats: COMPLETE Column stats: NONE
                              HashTable Sink Operator
                                condition expressions:
                                  0 {_col0} {_col1} {_col5} {_col6}
                                  1 {key} {value}
                                keys:
                                  0 (_col0 + _col5) (type: double)
                                  1 UDFToDouble(key) (type: double)
      
        Stage: Stage-0
          Fetch Operator
            limit: -1
            Processor Tree:
              ListSink
      

      Attachments

        1. HIVE-8842.3-spark.patch
          3 kB
          Chao Sun
        2. HIVE-8842.2-spark.patch
          4 kB
          Chao Sun
        3. HIVE-8842.1-spark.patch
          2 kB
          Chao Sun

        Issue Links

          Activity

            People

              csun Chao Sun
              szehon Szehon Ho
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: