Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-6040

to_utc_timestamp() not intuitive when cluster timezone is not set to UTC

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.12.1, 0.13.0
    • None
    • Types, UDF
    • None

    Description

      unix_timestamp() is the root of string to timestamp conversion operations.

      Therefore, when local cluster timezone is NOT set to UTC the results produced by to_utc_timestamp() may be confusing to the user.

      The query below was run on a cluster with the local timezone set to 'America/Denver'
      For reference, 1386000000 = 2013-12-02 16:00:00 GMT

      SELECT
      from_unixtime(1386000000) as test01,
      – test01 = 2013-12-02 09:00:00
      – local timezone applies to from_unixtime()
      unix_timestamp(from_unixtime(1386000000)) as test02,
      – test02 = 1386000000
      – local timezone applies to both unix_timestamp and from_unixtime()
      unix_timestamp('2013-12-02 16:00:00') as test03,
      – test03 = 1386025200
      – local timezone applies to from_unixtime()
      from_utc_timestamp(1386000000, 'UTC') as test04,
      – test04 = 1970-01-16 18:00:00
      – This demonstrates the bug from HIVE-3454
      cast(from_utc_timestamp(cast(1386000000 as DOUBLE),'UTC') as BIGINT) as test05,
      – test05 = 1386000000
      – one work-around to HIVE-3454
      cast(from_utc_timestamp(1386000000 * 1.0,'UTC') as BIGINT) as test06,
      – test06 = 1386000000
      – a second work-around to HIVE-3454
      cast(from_utc_timestamp(cast(cast(1386000000 as DOUBLE) as TIMESTAMP),'UTC') as BIGINT) as test07,
      --test07 = 1386000000
      --direct cast conversion of DOUBLE to TIMESTAMP works
      cast(cast('2013-12-02 16:00:00' as TIMESTAMP) as BIGINT) as test08,
      --test08 = 1386025200
      – same result as test03, cast conversion uses local timezone setting
      cast(from_utc_timestamp(cast('2013-12-02 16:00:00' as TIMESTAMP),'UTC') as BIGINT) as test09,
      --test09 = 1386025200
      --same as test08 demonstrating consistency even when using from_utc_timestamp()
      cast(from_utc_timestamp(cast('2013-12-02 16:00:00' as TIMESTAMP),'America/Denver') as BIGINT) as test10,
      --test10 = 1386000000
      --not exactly intuitive, but this is the correct result
      cast(from_utc_timestamp(unix_timestamp('2013-12-02 16:00:00'),'America/Denver') as BIGINT) as test11,
      --test11= 1360825
      --similar to test10, but producing HIVE-3454 bug
      cast(from_utc_timestamp(unix_timestamp('2013-12-02 16:00:00') * 1.0,'America/Denver') as BIGINT) as test12,
      --test12=1386000000
      --like test10/test11 with HIVE-3454 work-around.
      cast(to_utc_timestamp('2013-12-02 16:00:00','UTC') as BIGINT) as test13,
      --test13=1386025200
      --intuitively this seems to be the correct approach
      --the result is wrong.
      cast(cast('2013-12-02 16:00:00' as TIMESTAMP) as BIGINT) as test14,
      --test14=1386025200
      --this result could be confusing to the user
      cast( from_utc_timestamp( cast( '2013-12-02 16:00:00' as TIMESTAMP ), '\${system:user.timezone}' ) as BIGINT ) as test15,
      --test15=1386000000
      --like test10, using hiveconf system variable substitution
      cast( from_utc_timestamp( unix_timestamp( '2013-12-02 16:00:00') * 1.0, '\${system:user.timezone}' ) as BIGINT) as test16,
      --test16=1386000000
      --like test12, using hiveconf system variable substitution
      cast( to_utc_timestamp( from_utc_timestamp( unix_timestamp( '2013-12-02 16:00:00' ) * 1.0, '\${system:user.timezone}' ), 'UTC') as BIGINT) as test17
      --test17=1386000000
      --universal work-around, produces results expected by test13
      FROM dummy

      To work with datetime string conversions using to_utc_timestamp in a way that produces consistent results, regardless of cluster timezone configuration, the following work-around is required:

      to_utc_timestamp(from_utc_timestamp(unix_timestamp('2013-12-02 16:00:00')*1.0,'\${system:user.timezone}'), '<logsource_timezone>')

      One solution could be to add a utc_timestamp() udf to hive and for to_utc_timestamp() to use that for implicit datetime string conversions.

      Attachments

        Activity

          People

            Unassigned Unassigned
            rharris Ryan Harris
            Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: