Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-21733

'from_unixtime' does not recognise timezone

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.1.0
    • None
    • Hive, hpl/sql
    • None

    Description

      Dear experts,

      I try to convert a unix timestamp to a date using 'from_unixtime' in Hive 3.1 do not get the correct result in my local timezone but in UTC. As I have a Hive 1.2 system avaialble also, I can state that in that version, the issue does not exist.

      I performed extensive tests on systems running HDP 3.1 as well as HDP 2.6.
      For the detailled versions of the HDP stack and its components, see the beeline results below.

      I start from the European timezone "Europe/Berlin" or "CEST" (c.f. the date of the post) set on all my system:

      [user@host ~]$ date
      Wed May 15 13:48:05 CEST 2019
      [user@host ~]${code}
      
      In a next step, I launch beeline and use the following timestamp as a test-value:
      1557921076
      
      (corresponding to Wed May 15 13:51:16 2019 CEST and  Wed May 15 11:51:16 2019 UTC)
      
      On the HDP 2.6 system:
      {code:java}
      Connecting to jdbc:hive2://host.domain.com:2181,host.domain.com:2181,host.domain.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;
      Connected to: Apache Hive (version 1.2.1000.2.6.0.3-8)
      Driver: Hive JDBC (version 1.2.1000.2.6.0.3-8)
      Transaction isolation: TRANSACTION_REPEATABLE_READ
      Beeline version 1.2.1000.2.6.0.3-8 by Apache Hive
      0: jdbc:hive2://host.domain.com:2181,host> select from_unixtime(1557921076);
      _c0
      2019-05-15 13:51:16
      1 row selected (0.373 seconds)

      As you can see, using HDP 2.6 with Hive 1.2 the conversion works as expected. The timestamp is cast into a time in the local timezone. Very nice!

      On the HDP 3.1 system:

      [user@host ~]$ beeline
      SLF4J: Class path contains multiple SLF4J bindings.
      SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
      SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
      SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
      SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
      Connecting to jdbc:hive2://host.domain.com:2181,host.domain.com:2181,host.domain.com:2181/default;principal=hive/_HOST@domain.COM;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2
      19/05/15 13:55:36 [main]: INFO jdbc.HiveConnection: Connected to host.domain.com:10000
      Connected to: Apache Hive (version 3.1.0.3.1.0.0-78)
      Driver: Hive JDBC (version 3.1.0.3.1.0.0-78)
      Transaction isolation: TRANSACTION_REPEATABLE_READ
      Beeline version 3.1.0.3.1.0.0-78 by Apache Hive
      0: jdbc:hive2://host.domain.com:2181,host.n> select from_unixtime(1557921076);
      DEBUG : Acquired the compile lock.
      INFO  : Compiling command(queryId=hive_20190515135655_2926bbf9-4b57-4932-a79c-96e0c7d53514): select from_unixtime(1557921076)
      DEBUG : Encoding valid txns info 13519:13517:13517:11708,11720,11725,11735,11739,11749,11751,11753,11755,11771,11779,11790,11813,11833,11875,11883,12050,12196,12310,12342,13116,13126,13128,13130,13153,13155,13157,13159,13161,13163,13166,13169,13277,13279,13281,13290,13414,13422,13424,13473,13476,13485 txnid:13519
      INFO  : Semantic Analysis Completed (retrial = false)
      INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, type:string, comment:null)], properties:null)
      INFO  : Completed compiling command(queryId=hive_20190515135655_2926bbf9-4b57-4932-a79c-96e0c7d53514); Time taken: 0.492 seconds
      INFO  : Executing command(queryId=hive_20190515135655_2926bbf9-4b57-4932-a79c-96e0c7d53514): select from_unixtime(1557921076)
      INFO  : Completed executing command(queryId=hive_20190515135655_2926bbf9-4b57-4932-a79c-96e0c7d53514); Time taken: 0.011 seconds
      INFO  : OK
      DEBUG : Shutting down query select from_unixtime(1557921076)
      +----------------------+
      |         _c0          |
      +----------------------+
      | 2019-05-15 11:51:16  |
      +----------------------+
      1 row selected (0.672 seconds)

      Here, things go the wrong way, as described in the above post. The local timezone is not recognised by Hive and the default of 'UTC' is chosen, i.e. an offset of 2h is returned.
      I was able to track the problem down a bit further. Doing the detour of converting via 'from_unix_timestamp' where providing a timezone parameter is possible, gives the correct result:

      0: jdbc:hive2://host.domain.com:2181,host.n> select from_utc_timestamp(from_unixtime(1557921076), 'Europe/Berlin');
      DEBUG : Acquired the compile lock.
      INFO  : Compiling command(queryId=hive_20190515135924_393ff49c-1ce3-4c2b-8801-c22899c0d2c8): select from_utc_timestamp(from_unixtime(1557921076), 'Europe/Berlin')
      DEBUG : Encoding valid txns info 13520:13517:13517:11708,11720,11725,11735,11739,11749,11751,11753,11755,11771,11779,11790,11813,11833,11875,11883,12050,12196,12310,12342,13116,13126,13128,13130,13153,13155,13157,13159,13161,13163,13166,13169,13277,13279,13281,13290,13414,13422,13424,13473,13476,13485 txnid:13520
      INFO  : Semantic Analysis Completed (retrial = false)
      INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, type:timestamp, comment:null)], properties:null)
      INFO  : Completed compiling command(queryId=hive_20190515135924_393ff49c-1ce3-4c2b-8801-c22899c0d2c8); Time taken: 1.024 seconds
      INFO  : Executing command(queryId=hive_20190515135924_393ff49c-1ce3-4c2b-8801-c22899c0d2c8): select from_utc_timestamp(from_unixtime(1557921076), 'Europe/Berlin')
      INFO  : Completed executing command(queryId=hive_20190515135924_393ff49c-1ce3-4c2b-8801-c22899c0d2c8); Time taken: 0.01 seconds
      INFO  : OK
      DEBUG : Shutting down query select from_utc_timestamp(from_unixtime(1557921076), 'Europe/Berlin')
      +------------------------+
      |          _c0           |
      +------------------------+
      | 2019-05-15 13:51:16.0  |
      +------------------------+
      1 row selected (1.187 seconds)

      Giving 'CEST', however, leads to the observed problem from above:

      0: jdbc:hive2://host.domain.com:2181,host.n> select from_utc_timestamp(from_unixtime(1557921076), 'CEST');
      DEBUG : Acquired the compile lock.
      INFO  : Compiling command(queryId=hive_20190515140104_8b4293d0-53e8-480b-9fe4-3bdee8bc3caa): select from_utc_timestamp(from_unixtime(1557921076), 'CEST')
      DEBUG : Encoding valid txns info 13522:13517:13517,13521:11708,11720,11725,11735,11739,11749,11751,11753,11755,11771,11779,11790,11813,11833,11875,11883,12050,12196,12310,12342,13116,13126,13128,13130,13153,13155,13157,13159,13161,13163,13166,13169,13277,13279,13281,13290,13414,13422,13424,13473,13476,13485 txnid:13522
      INFO  : Semantic Analysis Completed (retrial = false)
      INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, type:timestamp, comment:null)], properties:null)
      INFO  : Completed compiling command(queryId=hive_20190515140104_8b4293d0-53e8-480b-9fe4-3bdee8bc3caa); Time taken: 0.396 seconds
      INFO  : Executing command(queryId=hive_20190515140104_8b4293d0-53e8-480b-9fe4-3bdee8bc3caa): select from_utc_timestamp(from_unixtime(1557921076), 'CEST')
      INFO  : Completed executing command(queryId=hive_20190515140104_8b4293d0-53e8-480b-9fe4-3bdee8bc3caa); Time taken: 0.031 seconds
      INFO  : OK
      DEBUG : Shutting down query select from_utc_timestamp(from_unixtime(1557921076), 'CEST')
      +------------------------+
      |          _c0           |
      +------------------------+
      | 2019-05-15 11:51:16.0  |
      +------------------------+
      1 row selected (0.547 seconds)

      Although 'CEST' and 'Europe/Berlin' should be equivalent this time of the year, the result differs. I assume that 'CEST' is simply not recognised by Hive and the fall-back to UTC is used.
      A last test with the European winter time 'CET' gives the following result:

      0: jdbc:hive2://host.domain.com:2181,host.n> select from_utc_timestamp(from_unixtime(1557921076), 'CET');
      DEBUG : Acquired the compile lock.
      INFO  : Compiling command(queryId=hive_20190515140616_7c41fa87-d375-49d6-ac5b-7aeabf46e821): select from_utc_timestamp(from_unixtime(1557921076), 'CET')
      DEBUG : Encoding valid txns info 13525:9223372036854775807::11708,11720,11725,11735,11739,11749,11751,11753,11755,11771,11779,11790,11813,11833,11875,11883,12050,12196,12310,12342,13116,13126,13128,13130,13153,13155,13157,13159,13161,13163,13166,13169,13277,13279,13281,13290,13414,13422,13424,13473,13476,13485,13517,13521,13523 txnid:13525
      INFO  : Semantic Analysis Completed (retrial = false)
      INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, type:timestamp, comment:null)], properties:null)
      INFO  : Completed compiling command(queryId=hive_20190515140616_7c41fa87-d375-49d6-ac5b-7aeabf46e821); Time taken: 0.777 seconds
      INFO  : Executing command(queryId=hive_20190515140616_7c41fa87-d375-49d6-ac5b-7aeabf46e821): select from_utc_timestamp(from_unixtime(1557921076), 'CET')
      INFO  : Completed executing command(queryId=hive_20190515140616_7c41fa87-d375-49d6-ac5b-7aeabf46e821); Time taken: 0.022 seconds
      INFO  : OK
      DEBUG : Shutting down query select from_utc_timestamp(from_unixtime(1557921076), 'CET')
      +------------------------+
      |          _c0           |
      +------------------------+
      | 2019-05-15 13:51:16.0  |
      +------------------------+
      1 row selected (1.02 seconds)

      Although 'CET' differs by 1 hour from 'CEST' and 'Europe/Berlin', the result is correct. I assume that Hive realises that 'CET'-member timezones are actually in 'CEST' this time of the year and corrects accordingly.
      As nice of a work-around using 'Europe/Berlin' may look at a first glance, I suspect things to go wrong when the actual timezone is 'CET' in winter.
      I suspect 'from_unixtime' to recognise 'CET' as a valid timezone once it is the timezone of the local system, resulting in a wrongly cast 'from_unixtime' in the inner parentheses.
      A summer-/winter-time independent work-around is the following:

      0: jdbc:hive2://host.domain.com:2181,host.n> select from_utc_timestamp(cast(1557921076 as BIGINT)*1000, 'Europe/Berlin');
      DEBUG : Acquired the compile lock.
      INFO  : Compiling command(queryId=hive_20190515141328_7cda82c9-4dc1-448d-a8f4-864801132aed): select from_utc_timestamp(cast(1557921076 as BIGINT)*1000, 'CET')
      DEBUG : Encoding valid txns info 13531:9223372036854775807::11708,11720,11725,11735,11739,11749,11751,11753,11755,11771,11779,11790,11813,11833,11875,11883,12050,12196,12310,12342,13116,13126,13128,13130,13153,13155,13157,13159,13161,13163,13166,13169,13277,13279,13281,13290,13414,13422,13424,13473,13476,13485 txnid:13531
      INFO  : Semantic Analysis Completed (retrial = false)
      INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, type:timestamp, comment:null)], properties:null)
      INFO  : Completed compiling command(queryId=hive_20190515141328_7cda82c9-4dc1-448d-a8f4-864801132aed); Time taken: 0.453 seconds
      INFO  : Executing command(queryId=hive_20190515141328_7cda82c9-4dc1-448d-a8f4-864801132aed): select from_utc_timestamp(cast(1557921076 as BIGINT)*1000, 'CET')
      INFO  : Completed executing command(queryId=hive_20190515141328_7cda82c9-4dc1-448d-a8f4-864801132aed); Time taken: 0.009 seconds
      INFO  : OK
      DEBUG : Shutting down query select from_utc_timestamp(cast(1557921076 as BIGINT)*1000, 'Europe/Berlin')
      +------------------------+
      |          _c0           |
      +------------------------+
      | 2019-05-15 13:51:16.0  |
      +------------------------+
      1 row selected (0.586 seconds)

      Here, I avoid the usage of the timezone-sensitive 'from_unixtime' entirly by casting multiplying the timestamp by a factor of 1000 to get a genuine unixtimestamp with milisecond-precision (in contrast to the truncated one with second-precision used before).
      For this trick to work, casting the truncated timestamp as BIGINT is necessary as the '* 1000'-operation would exceed the size of a regular INT.

      This neat work-around, however, does not work on the HDP 2.6/Hive 1.2 system any more, here, the the 2h offset is applied in the wrong direction. It is hence not applicable independently of the Hive-Version, unfortunately.

      I would like to understand where the original issue arises from, the ignorance of Hive towards the 'CEST' timezone (in my case).
      As it is working nicely for HDP 2.6 and Hive 1.2, I suspect a fix of this bug is possible also for HDP 3.1 and Hive 3.1.

      I am looking forward to some input from the expert community.

      Cheers

      Oliver

      Attachments

        Activity

          People

            Unassigned Unassigned
            oricken Oliver Ricken
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: