Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-22489

Reduce Sink operator should order nulls by parameter

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0.0
    • Component/s: Query Planning
    • Labels:
      None
    • Target Version/s:

      Description

      When the property hive.default.nulls.last is set to true and no null order is explicitly specified in the ORDER BY clause of the query null ordering should be NULLS LAST.
      But some of the Reduce Sink operators still orders null first.

      SET hive.default.nulls.last=true;
      
      EXPLAIN EXTENDED
      SELECT src1.key, src2.value FROM src src1 JOIN src src2 ON (src1.key = src2.key) ORDER BY src1.key LIMIT 5;
      
      PREHOOK: query: EXPLAIN EXTENDED
      SELECT src1.key, src2.value FROM src src1 JOIN src src2 ON (src1.key = src2.key) ORDER BY src1.key
      PREHOOK: type: QUERY
      PREHOOK: Input: default@src
      #### A masked pattern was here ####
      POSTHOOK: query: EXPLAIN EXTENDED
      SELECT src1.key, src2.value FROM src src1 JOIN src src2 ON (src1.key = src2.key) ORDER BY src1.key
      POSTHOOK: type: QUERY
      POSTHOOK: Input: default@src
      #### A masked pattern was here ####
      OPTIMIZED SQL: SELECT `t0`.`key`, `t2`.`value`
      FROM (SELECT `key`
      FROM `default`.`src`
      WHERE `key` IS NOT NULL) AS `t0`
      INNER JOIN (SELECT `key`, `value`
      FROM `default`.`src`
      WHERE `key` IS NOT NULL) AS `t2` ON `t0`.`key` = `t2`.`key`
      ORDER BY `t0`.`key`
      STAGE DEPENDENCIES:
        Stage-1 is a root stage
        Stage-0 depends on stages: Stage-1
      
      STAGE PLANS:
        Stage: Stage-1
          Tez
      #### A masked pattern was here ####
            Edges:
              Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 4 (SIMPLE_EDGE)
              Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
      #### A masked pattern was here ####
            Vertices:
              Map 1 
                  Map Operator Tree:
                      TableScan
                        alias: src1
                        filterExpr: key is not null (type: boolean)
                        Statistics: Num rows: 500 Data size: 43500 Basic stats: COMPLETE Column stats: COMPLETE
                        GatherStats: false
                        Filter Operator
                          isSamplingPred: false
                          predicate: key is not null (type: boolean)
                          Statistics: Num rows: 500 Data size: 43500 Basic stats: COMPLETE Column stats: COMPLETE
                          Select Operator
                            expressions: key (type: string)
                            outputColumnNames: _col0
                            Statistics: Num rows: 500 Data size: 43500 Basic stats: COMPLETE Column stats: COMPLETE
                            Reduce Output Operator
                              key expressions: _col0 (type: string)
                              null sort order: a
                              sort order: +
                              Map-reduce partition columns: _col0 (type: string)
                              Statistics: Num rows: 500 Data size: 43500 Basic stats: COMPLETE Column stats: COMPLETE
                              tag: 0
                              auto parallelism: true
                  Execution mode: vectorized, llap
                  LLAP IO: no inputs
                  Path -> Alias:
      #### A masked pattern was here ####
                  Path -> Partition:
      #### A masked pattern was here ####
                      Partition
                        base file name: src
                        input format: org.apache.hadoop.mapred.TextInputFormat
                        output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                        properties:
                          COLUMN_STATS_ACCURATE {"BASIC_STATS":"true","COLUMN_STATS":{"key":"true","value":"true"}}
                          bucket_count -1
                          bucketing_version 2
                          column.name.delimiter ,
                          columns key,value
                          columns.comments 'default','default'
                          columns.types string:string
      #### A masked pattern was here ####
                          name default.src
                          numFiles 1
                          numRows 500
                          rawDataSize 5312
                          serialization.ddl struct src { string key, string value}
                          serialization.format 1
                          serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                          totalSize 5812
      #### A masked pattern was here ####
                        serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                      
                          input format: org.apache.hadoop.mapred.TextInputFormat
                          output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                          properties:
                            COLUMN_STATS_ACCURATE {"BASIC_STATS":"true","COLUMN_STATS":{"key":"true","value":"true"}}
                            bucket_count -1
                            bucketing_version 2
                            column.name.delimiter ,
                            columns key,value
                            columns.comments 'default','default'
                            columns.types string:string
      #### A masked pattern was here ####
                            name default.src
                            numFiles 1
                            numRows 500
                            rawDataSize 5312
                            serialization.ddl struct src { string key, string value}
                            serialization.format 1
                            serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                            totalSize 5812
      #### A masked pattern was here ####
                          serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                          name: default.src
                        name: default.src
                  Truncated Path -> Alias:
                    /src [src1]
              Map 4 
                  Map Operator Tree:
                      TableScan
                        alias: src2
                        filterExpr: key is not null (type: boolean)
                        Statistics: Num rows: 500 Data size: 89000 Basic stats: COMPLETE Column stats: COMPLETE
                        GatherStats: false
                        Filter Operator
                          isSamplingPred: false
                          predicate: key is not null (type: boolean)
                          Statistics: Num rows: 500 Data size: 89000 Basic stats: COMPLETE Column stats: COMPLETE
                          Select Operator
                            expressions: key (type: string), value (type: string)
                            outputColumnNames: _col0, _col1
                            Statistics: Num rows: 500 Data size: 89000 Basic stats: COMPLETE Column stats: COMPLETE
                            Reduce Output Operator
                              key expressions: _col0 (type: string)
                              null sort order: a
                              sort order: +
                              Map-reduce partition columns: _col0 (type: string)
                              Statistics: Num rows: 500 Data size: 89000 Basic stats: COMPLETE Column stats: COMPLETE
                              tag: 1
                              value expressions: _col1 (type: string)
                              auto parallelism: true
                  Execution mode: vectorized, llap
                  LLAP IO: no inputs
                  Path -> Alias:
      #### A masked pattern was here ####
                  Path -> Partition:
      #### A masked pattern was here ####
                      Partition
                        base file name: src
                        input format: org.apache.hadoop.mapred.TextInputFormat
                        output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                        properties:
                          COLUMN_STATS_ACCURATE {"BASIC_STATS":"true","COLUMN_STATS":{"key":"true","value":"true"}}
                          bucket_count -1
                          bucketing_version 2
                          column.name.delimiter ,
                          columns key,value
                          columns.comments 'default','default'
                          columns.types string:string
      #### A masked pattern was here ####
                          name default.src
                          numFiles 1
                          numRows 500
                          rawDataSize 5312
                          serialization.ddl struct src { string key, string value}
                          serialization.format 1
                          serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                          totalSize 5812
      #### A masked pattern was here ####
                        serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                      
                          input format: org.apache.hadoop.mapred.TextInputFormat
                          output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                          properties:
                            COLUMN_STATS_ACCURATE {"BASIC_STATS":"true","COLUMN_STATS":{"key":"true","value":"true"}}
                            bucket_count -1
                            bucketing_version 2
                            column.name.delimiter ,
                            columns key,value
                            columns.comments 'default','default'
                            columns.types string:string
      #### A masked pattern was here ####
                            name default.src
                            numFiles 1
                            numRows 500
                            rawDataSize 5312
                            serialization.ddl struct src { string key, string value}
                            serialization.format 1
                            serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                            totalSize 5812
      #### A masked pattern was here ####
                          serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                          name: default.src
                        name: default.src
                  Truncated Path -> Alias:
                    /src [src2]
              Reducer 2 
                  Execution mode: llap
                  Needs Tagging: false
                  Reduce Operator Tree:
                    Merge Join Operator
                      condition map:
                           Inner Join 0 to 1
                      keys:
                        0 _col0 (type: string)
                        1 _col0 (type: string)
                      outputColumnNames: _col0, _col2
                      Position of Big Table: 1
                      Statistics: Num rows: 791 Data size: 140798 Basic stats: COMPLETE Column stats: COMPLETE
                      Select Operator
                        expressions: _col0 (type: string), _col2 (type: string)
                        outputColumnNames: _col0, _col1
                        Statistics: Num rows: 791 Data size: 140798 Basic stats: COMPLETE Column stats: COMPLETE
                        Reduce Output Operator
                          key expressions: _col0 (type: string)
                          null sort order: z
                          sort order: +
                          Statistics: Num rows: 791 Data size: 140798 Basic stats: COMPLETE Column stats: COMPLETE
                          tag: -1
                          value expressions: _col1 (type: string)
                          auto parallelism: false
              Reducer 3 
                  Execution mode: vectorized, llap
                  Needs Tagging: false
                  Reduce Operator Tree:
                    Select Operator
                      expressions: KEY.reducesinkkey0 (type: string), VALUE._col0 (type: string)
                      outputColumnNames: _col0, _col1
                      Statistics: Num rows: 791 Data size: 140798 Basic stats: COMPLETE Column stats: COMPLETE
                      File Output Operator
                        compressed: false
                        GlobalTableId: 0
      #### A masked pattern was here ####
                        NumFilesPerFileSink: 1
                        Statistics: Num rows: 791 Data size: 140798 Basic stats: COMPLETE Column stats: COMPLETE
      #### A masked pattern was here ####
                        table:
                            input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                            output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                            properties:
                              columns _col0,_col1
                              columns.types string:string
                              escape.delim \
                              hive.serialization.extend.additional.nesting.levels true
                              serialization.escape.crlf true
                              serialization.format 1
                              serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                            serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                        TotalFiles: 1
                        GatherStats: false
                        MultiFileSpray: false
      
        Stage: Stage-0
          Fetch Operator
            limit: -1
            Processor Tree:
              ListSink
      

        Attachments

        1. HIVE-22489.1.patch
          24 kB
          Krisztian Kasa
        2. HIVE-22489.10.patch
          8.80 MB
          Krisztian Kasa
        3. HIVE-22489.10.patch
          8.80 MB
          Krisztian Kasa
        4. HIVE-22489.11.patch
          8.80 MB
          Krisztian Kasa
        5. HIVE-22489.12.patch
          8.85 MB
          Krisztian Kasa
        6. HIVE-22489.13.patch
          8.85 MB
          Krisztian Kasa
        7. HIVE-22489.13.patch
          8.85 MB
          Krisztian Kasa
        8. HIVE-22489.13.patch
          8.85 MB
          Krisztian Kasa
        9. HIVE-22489.14.patch
          8.85 MB
          Krisztian Kasa
        10. HIVE-22489.2.patch
          25 kB
          Krisztian Kasa
        11. HIVE-22489.3.patch
          1.26 MB
          Krisztian Kasa
        12. HIVE-22489.3.patch
          1.26 MB
          Krisztian Kasa
        13. HIVE-22489.4.patch
          6.98 MB
          Krisztian Kasa
        14. HIVE-22489.5.patch
          8.43 MB
          Krisztian Kasa
        15. HIVE-22489.6.patch
          8.42 MB
          Krisztian Kasa
        16. HIVE-22489.7.patch
          8.44 MB
          Krisztian Kasa
        17. HIVE-22489.8.patch
          8.77 MB
          Krisztian Kasa
        18. HIVE-22489.9.patch
          8.79 MB
          Krisztian Kasa
        19. HIVE-22489.9.patch
          8.79 MB
          Krisztian Kasa

          Issue Links

            Activity

              People

              • Assignee:
                kkasa Krisztian Kasa
                Reporter:
                kkasa Krisztian Kasa
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: