Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-13837

current_timestamp() output format is different in some cases

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.0.0
    • 2.1.0
    • None
    • None

    Description

      As jdere reports:

      current_timestamp() udf returns result with different format in some cases.
      
      select current_timestamp() returns result with decimal precision:
      {noformat}
      hive> select current_timestamp();
      OK
      2016-04-14 18:26:58.875
      Time taken: 0.077 seconds, Fetched: 1 row(s)
      {noformat}
      
      But output format is different for select current_timestamp() from all100k union select current_timestamp() from over100k limit 5; 
      {noformat}
      hive> select current_timestamp() from all100k union select current_timestamp() from over100k limit 5;
      Query ID = hrt_qa_20160414182956_c4ed48f2-9913-4b3b-8f09-668ebf55b3e3
      Total jobs = 1
      Launching Job 1 out of 1
      Tez session was closed. Reopening...
      Session re-established.
      
      
      Status: Running (Executing on YARN cluster with App id application_1460611908643_0624)
      
      ----------------------------------------------------------------------------------------------
              VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED  
      ----------------------------------------------------------------------------------------------
      Map 1 ..........      llap     SUCCEEDED      1          1        0        0       0       0  
      Map 4 ..........      llap     SUCCEEDED      1          1        0        0       0       0  
      Reducer 3 ......      llap     SUCCEEDED      1          1        0        0       0       0  
      ----------------------------------------------------------------------------------------------
      VERTICES: 03/03  [==========================>>] 100%  ELAPSED TIME: 0.92 s     
      ----------------------------------------------------------------------------------------------
      OK
      2016-04-14 18:29:56
      Time taken: 10.558 seconds, Fetched: 1 row(s)
      {noformat}
      
      explain plan for select current_timestamp();
      {noformat}
      hive> explain extended select current_timestamp();
      OK
      ABSTRACT SYNTAX TREE:
        
      TOK_QUERY
         TOK_INSERT
            TOK_DESTINATION
               TOK_DIR
                  TOK_TMP_FILE
            TOK_SELECT
               TOK_SELEXPR
                  TOK_FUNCTION
                     current_timestamp
      
      
      STAGE DEPENDENCIES:
        Stage-0 is a root stage
      
      STAGE PLANS:
        Stage: Stage-0
          Fetch Operator
            limit: -1
            Processor Tree:
              TableScan
                alias: _dummy_table
                Row Limit Per Split: 1
                GatherStats: false
                Select Operator
                  expressions: 2016-04-14 18:30:57.206 (type: timestamp)
                  outputColumnNames: _col0
                  ListSink
      
      Time taken: 0.062 seconds, Fetched: 30 row(s)
      {noformat}
      
      explain plan for select current_timestamp() from all100k union select current_timestamp() from over100k limit 5;
      {noformat}
      hive> explain extended select current_timestamp() from all100k union select current_timestamp() from over100k limit 5;
      OK
      ABSTRACT SYNTAX TREE:
        
      TOK_QUERY
         TOK_FROM
            TOK_SUBQUERY
               TOK_QUERY
                  TOK_FROM
                     TOK_SUBQUERY
                        TOK_UNIONALL
                           TOK_QUERY
                              TOK_FROM
                                 TOK_TABREF
                                    TOK_TABNAME
                                       all100k
                              TOK_INSERT
                                 TOK_DESTINATION
                                    TOK_DIR
                                       TOK_TMP_FILE
                                 TOK_SELECT
                                    TOK_SELEXPR
                                       TOK_FUNCTION
                                          current_timestamp
                           TOK_QUERY
                              TOK_FROM
                                 TOK_TABREF
                                    TOK_TABNAME
                                       over100k
                              TOK_INSERT
                                 TOK_DESTINATION
                                    TOK_DIR
                                       TOK_TMP_FILE
                                 TOK_SELECT
                                    TOK_SELEXPR
                                       TOK_FUNCTION
                                          current_timestamp
                        _u1
                  TOK_INSERT
                     TOK_DESTINATION
                        TOK_DIR
                           TOK_TMP_FILE
                     TOK_SELECTDI
                        TOK_SELEXPR
                           TOK_ALLCOLREF
               _u2
         TOK_INSERT
            TOK_DESTINATION
               TOK_DIR
                  TOK_TMP_FILE
            TOK_SELECT
               TOK_SELEXPR
                  TOK_ALLCOLREF
            TOK_LIMIT
               5
      
      
      STAGE DEPENDENCIES:
        Stage-1 is a root stage
        Stage-0 depends on stages: Stage-1
      
      STAGE PLANS:
        Stage: Stage-1
          Tez
            DagId: hrt_qa_20160414183119_ec8e109e-8975-4799-a142-4a2289f85910:7
            Edges:
              Map 1 <- Union 2 (CONTAINS)
              Map 4 <- Union 2 (CONTAINS)
              Reducer 3 <- Union 2 (SIMPLE_EDGE)
            DagName: 
            Vertices:
              Map 1 
                  Map Operator Tree:
                      TableScan
                        alias: all100k
                        Statistics: Num rows: 100000 Data size: 15801336 Basic stats: COMPLETE Column stats: COMPLETE
                        GatherStats: false
                        Select Operator
                          Statistics: Num rows: 100000 Data size: 4000000 Basic stats: COMPLETE Column stats: COMPLETE
                          Select Operator
                            expressions: 2016-04-14 18:31:19.0 (type: timestamp)
                            outputColumnNames: _col0
                            Statistics: Num rows: 200000 Data size: 8000000 Basic stats: COMPLETE Column stats: COMPLETE
                            Group By Operator
                              keys: _col0 (type: timestamp)
                              mode: hash
                              outputColumnNames: _col0
                              Statistics: Num rows: 1 Data size: 40 Basic stats: COMPLETE Column stats: COMPLETE
                              Reduce Output Operator
                                key expressions: _col0 (type: timestamp)
                                null sort order: a
                                sort order: +
                                Map-reduce partition columns: _col0 (type: timestamp)
                                Statistics: Num rows: 1 Data size: 40 Basic stats: COMPLETE Column stats: COMPLETE
                                tag: -1
                                TopN: 5
                                TopN Hash Memory Usage: 0.04
                                auto parallelism: true
                  Execution mode: llap
                  LLAP IO: no inputs
                  Path -> Alias:
                    hdfs://os-r6-qugztu-hive-1-5.novalocal:8020/user/hcat/tests/data/all100k [all100k]
                  Path -> Partition:
                    hdfs://os-r6-qugztu-hive-1-5.novalocal:8020/user/hcat/tests/data/all100k 
                      Partition
                        base file name: all100k
                        input format: org.apache.hadoop.mapred.TextInputFormat
                        output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                        properties:
                          COLUMN_STATS_ACCURATE {"BASIC_STATS":"true","COLUMN_STATS":{"t":"true","si":"true","i":"true","b":"true","f":"true","d":"true","s":"true","dc":"true","bo":"true","v":"true","c":"true","ts":"true"}}
                          EXTERNAL TRUE
                          bucket_count -1
                          columns t,si,i,b,f,d,s,dc,bo,v,c,ts,dt
                          columns.comments 
                          columns.types tinyint:smallint:int:bigint:float:double:string:decimal(38,18):boolean:varchar(25):char(25):timestamp:date
                          field.delim |
                          file.inputformat org.apache.hadoop.mapred.TextInputFormat
                          file.outputformat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                          location hdfs://os-r6-qugztu-hive-1-5.novalocal:8020/user/hcat/tests/data/all100k
                          name default.all100k
                          numFiles 1
                          numRows 100000
                          rawDataSize 15801336
                          serialization.ddl struct all100k { byte t, i16 si, i32 i, i64 b, float f, double d, string s, decimal(38,18) dc, bool bo, varchar(25) v, char(25) c, timestamp ts, date dt}
                          serialization.format |
                          serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                          totalSize 15901336
                          transient_lastDdlTime 1460612683
                        serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                      
                          input format: org.apache.hadoop.mapred.TextInputFormat
                          output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                          properties:
                            COLUMN_STATS_ACCURATE {"BASIC_STATS":"true","COLUMN_STATS":{"t":"true","si":"true","i":"true","b":"true","f":"true","d":"true","s":"true","dc":"true","bo":"true","v":"true","c":"true","ts":"true"}}
                            EXTERNAL TRUE
                            bucket_count -1
                            columns t,si,i,b,f,d,s,dc,bo,v,c,ts,dt
                            columns.comments 
                            columns.types tinyint:smallint:int:bigint:float:double:string:decimal(38,18):boolean:varchar(25):char(25):timestamp:date
                            field.delim |
                            file.inputformat org.apache.hadoop.mapred.TextInputFormat
                            file.outputformat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                            location hdfs://os-r6-qugztu-hive-1-5.novalocal:8020/user/hcat/tests/data/all100k
                            name default.all100k
                            numFiles 1
                            numRows 100000
                            rawDataSize 15801336
                            serialization.ddl struct all100k { byte t, i16 si, i32 i, i64 b, float f, double d, string s, decimal(38,18) dc, bool bo, varchar(25) v, char(25) c, timestamp ts, date dt}
                            serialization.format |
                            serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                            totalSize 15901336
                            transient_lastDdlTime 1460612683
                          serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                          name: default.all100k
                        name: default.all100k
                  Truncated Path -> Alias:
                    hdfs://os-r6-qugztu-hive-1-5.novalocal:8020/user/hcat/tests/data/all100k [all100k]
              Map 4 
                  Map Operator Tree:
                      TableScan
                        alias: over100k
                        Statistics: Num rows: 100000 Data size: 6631229 Basic stats: COMPLETE Column stats: COMPLETE
                        GatherStats: false
                        Select Operator
                          Statistics: Num rows: 100000 Data size: 4000000 Basic stats: COMPLETE Column stats: COMPLETE
                          Select Operator
                            expressions: 2016-04-14 18:31:19.0 (type: timestamp)
                            outputColumnNames: _col0
                            Statistics: Num rows: 200000 Data size: 8000000 Basic stats: COMPLETE Column stats: COMPLETE
                            Group By Operator
                              keys: _col0 (type: timestamp)
                              mode: hash
                              outputColumnNames: _col0
                              Statistics: Num rows: 1 Data size: 40 Basic stats: COMPLETE Column stats: COMPLETE
                              Reduce Output Operator
                                key expressions: _col0 (type: timestamp)
                                null sort order: a
                                sort order: +
                                Map-reduce partition columns: _col0 (type: timestamp)
                                Statistics: Num rows: 1 Data size: 40 Basic stats: COMPLETE Column stats: COMPLETE
                                tag: -1
                                TopN: 5
                                TopN Hash Memory Usage: 0.04
                                auto parallelism: true
                  Execution mode: llap
                  LLAP IO: no inputs
                  Path -> Alias:
                    hdfs://os-r6-qugztu-hive-1-5.novalocal:8020/user/hcat/tests/data/over100k [over100k]
                  Path -> Partition:
                    hdfs://os-r6-qugztu-hive-1-5.novalocal:8020/user/hcat/tests/data/over100k 
                      Partition
                        base file name: over100k
                        input format: org.apache.hadoop.mapred.TextInputFormat
                        output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                        properties:
                          COLUMN_STATS_ACCURATE {"BASIC_STATS":"true","COLUMN_STATS":{"t":"true","si":"true","i":"true","b":"true","f":"true","d":"true","bo":"true","s":"true","bin":"true"}}
                          EXTERNAL TRUE
                          bucket_count -1
                          columns t,si,i,b,f,d,bo,s,bin
                          columns.comments 
                          columns.types tinyint:smallint:int:bigint:float:double:boolean:string:binary
                          field.delim :
                          file.inputformat org.apache.hadoop.mapred.TextInputFormat
                          file.outputformat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                          location hdfs://os-r6-qugztu-hive-1-5.novalocal:8020/user/hcat/tests/data/over100k
                          name default.over100k
                          numFiles 1
                          numRows 100000
                          rawDataSize 6631229
                          serialization.ddl struct over100k { byte t, i16 si, i32 i, i64 b, float f, double d, bool bo, string s, binary bin}
                          serialization.format :
                          serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                          totalSize 6731229
                          transient_lastDdlTime 1460612798
                        serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                      
                          input format: org.apache.hadoop.mapred.TextInputFormat
                          output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                          properties:
                            COLUMN_STATS_ACCURATE {"BASIC_STATS":"true","COLUMN_STATS":{"t":"true","si":"true","i":"true","b":"true","f":"true","d":"true","bo":"true","s":"true","bin":"true"}}
                            EXTERNAL TRUE
                            bucket_count -1
                            columns t,si,i,b,f,d,bo,s,bin
                            columns.comments 
                            columns.types tinyint:smallint:int:bigint:float:double:boolean:string:binary
                            field.delim :
                            file.inputformat org.apache.hadoop.mapred.TextInputFormat
                            file.outputformat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                            location hdfs://os-r6-qugztu-hive-1-5.novalocal:8020/user/hcat/tests/data/over100k
                            name default.over100k
                            numFiles 1
                            numRows 100000
                            rawDataSize 6631229
                            serialization.ddl struct over100k { byte t, i16 si, i32 i, i64 b, float f, double d, bool bo, string s, binary bin}
                            serialization.format :
                            serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                            totalSize 6731229
                            transient_lastDdlTime 1460612798
                          serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                          name: default.over100k
                        name: default.over100k
                  Truncated Path -> Alias:
                    hdfs://os-r6-qugztu-hive-1-5.novalocal:8020/user/hcat/tests/data/over100k [over100k]
              Reducer 3 
                  Execution mode: vectorized, llap
                  Needs Tagging: false
                  Reduce Operator Tree:
                    Group By Operator
                      keys: KEY._col0 (type: timestamp)
                      mode: mergepartial
                      outputColumnNames: _col0
                      Statistics: Num rows: 1 Data size: 40 Basic stats: COMPLETE Column stats: COMPLETE
                      Limit
                        Number of rows: 5
                        Statistics: Num rows: 1 Data size: 40 Basic stats: COMPLETE Column stats: COMPLETE
                        File Output Operator
                          compressed: false
                          GlobalTableId: 0
                          directory: hdfs://os-r6-qugztu-hive-1-5.novalocal:8020/tmp/hive/hrt_qa/ec0773d7-0ac2-45c7-b9cb-568bbed2c49c/hive_2016-04-14_18-31-19_532_3480081382837900888-1/-mr-10001/.hive-staging_hive_2016-04-14_18-31-19_532_3480081382837900888-1/-ext-10002
                          NumFilesPerFileSink: 1
                          Statistics: Num rows: 1 Data size: 40 Basic stats: COMPLETE Column stats: COMPLETE
                          Stats Publishing Key Prefix: hdfs://os-r6-qugztu-hive-1-5.novalocal:8020/tmp/hive/hrt_qa/ec0773d7-0ac2-45c7-b9cb-568bbed2c49c/hive_2016-04-14_18-31-19_532_3480081382837900888-1/-mr-10001/.hive-staging_hive_2016-04-14_18-31-19_532_3480081382837900888-1/-ext-10002/
                          table:
                              input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                              output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                              properties:
                                columns _col0
                                columns.types timestamp
                                escape.delim \
                                hive.serialization.extend.additional.nesting.levels true
                                serialization.escape.crlf true
                                serialization.format 1
                                serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                              serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                          TotalFiles: 1
                          GatherStats: false
                          MultiFileSpray: false
              Union 2 
                  Vertex: Union 2
      
        Stage: Stage-0
          Fetch Operator
            limit: 5
            Processor Tree:
              ListSink
      
      Time taken: 0.301 seconds, Fetched: 284 row(s)
      {noformat}
      
      Both the queries used return timestamp with YYYY-MM-DD HH:MM:SS.fff format in past releases.
      

      Attachments

        1. HIVE-13837.01.patch
          11 kB
          Pengcheng Xiong
        2. HIVE-13837.02.patch
          11 kB
          Pengcheng Xiong

        Activity

          People

            pxiong Pengcheng Xiong
            pxiong Pengcheng Xiong
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: