Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-8699 Enable support for common map join [Spark Branch]
  3. HIVE-8859

ColumnStatsTask fails because of SparkMapJoinResolver [Spark Branch]

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • spark-branch
    • 1.1.0
    • Spark
    • None

    Description

      The following query fails:

      ANALYZE TABLE src COMPUTE STATISTICS FOR COLUMNS key,value;
      

      The plan looks like:

      STAGE DEPENDENCIES:
        Stage-0 is a root stage
        Stage-2 is a root stage
      
      STAGE PLANS:
        Stage: Stage-0
          Spark
            Edges:
              Reducer 2 <- Map 1 (GROUP, 1)
            DagName: chao_20141113105959_486b4bba-a2da-43c5-bf42-0ee69cd42576:1
            Vertices:
              Map 1 
                  Map Operator Tree:
                      TableScan
                        alias: src
                        Statistics: Num rows: 500 Data size: 5312 Basic stats: COMPLETE Column stats: NONE
                        Select Operator
                          expressions: key (type: string), value (type: string)
                          outputColumnNames: key, value
                          Statistics: Num rows: 500 Data size: 5312 Basic stats: COMPLETE Column stats: NONE
                          Group By Operator
                            aggregations: compute_stats(key, 16), compute_stats(value, 16)
                            mode: hash
                            outputColumnNames: _col0, _col1
                            Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                            Reduce Output Operator
                              sort order: 
                              Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                              value expressions: _col0 (type: struct<columntype:string,maxlength:bigint,sumlength:bigint,count:bigint,countnulls:bigint,bitvector:string,numbitvectors:int>), _col1 (type: struct<columntype:string,maxlength:bigint,sumlength:bigint,count:bigint,countnulls:bigint,bitvector:string,numbitvectors:int>)
              Reducer 2 
                  Reduce Operator Tree:
                    Group By Operator
                      aggregations: compute_stats(VALUE._col0), compute_stats(VALUE._col1)
                      mode: mergepartial
                      outputColumnNames: _col0, _col1
                      Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                      Select Operator
                        expressions: _col0 (type: struct<columntype:string,maxlength:bigint,avglength:double,countnulls:bigint,numdistinctvalues:bigint>), _col1 (type: struct<columntype:string,maxlength:bigint,avglength:double,countnulls:bigint,numdistinctvalues:bigint>)
                        outputColumnNames: _col0, _col1
                        Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                        File Output Operator
                          compressed: false
                          Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                          table:
                              input format: org.apache.hadoop.mapred.TextInputFormat
                              output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                              serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      
        Stage: Stage-2
          Column Stats Work
            Column Stats Desc:
                Columns: key, value
                Column Types: string, string
                Table: src
      

      This query will fail because SparkMapJoinResolver#createSparkTask swaps the order of two tasks in the root task list. But, this is rather interesting, since if they are both root tasks, then order shouldn't matter.

      Attachments

        1. HIVE-8859.2-spark.patch
          1 kB
          Chao Sun
        2. HIVE-8859.1-spark.patch
          1 kB
          Chao Sun

        Activity

          People

            csun Chao Sun
            csun Chao Sun
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: