Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-1617

Druid adapter: Send timestamp literals to Druid as local time, not UTC

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.12.0
    • Component/s: druid
    • Labels:
      None

      Description

      In the Druid adapter, send timestamp literals to Druid as local time, not UTC. Druid will interpret them as instants in the Druid server's time zone. In SQL (and therefore in Calcite), timestamps do not have a time zone; in Joda-Time's terminology they represent a LocalDateTime rather than an Instant.

      In the following, "1999-11-10T00:00:00.000Z" should be "1999-11-10T00:00:00.000".

      hive> explain SELECT DISTINCT `__time`
          > FROM store_sales_sold_time_subset
          > WHERE `__time` < '1999-11-10 00:00:00';
      OK
      Plan optimized by CBO.
      
      Stage-0
        Fetch Operator
          limit:-1
          Select Operator [SEL_1]
            Output:["_col0"]
            TableScan [TS_0]
              Output:["__time"],properties:{"druid.query.json":"{\"queryType\":\"timeseries\",\"dataSource\":\"druid_tpcds_ss_sold_time_subset\",\"descending\":false,\"granularity\":\"NONE\",\"aggregations\":[],\"intervals\":[\"1900-01-01T00:00:00.000Z/1999-11-10T00:00:00.000Z\"]}","druid.query.type":"timeseries"}
      

        Activity

        Hide
        julianhyde Julian Hyde added a comment -

        Resolved in release 1.12.0 (2017-03-24).

        Show
        julianhyde Julian Hyde added a comment - Resolved in release 1.12.0 (2017-03-24).
        Show
        julianhyde Julian Hyde added a comment - Fixed in http://git-wip-us.apache.org/repos/asf/calcite/commit/6759218b .
        Hide
        julianhyde Julian Hyde added a comment -
        Show
        julianhyde Julian Hyde added a comment - Ok, apparently https://github.com/druid-io/druid/pull/3905 is a duplicate of https://github.com/druid-io/druid/pull/3216 and fixed in 0.9.2.
        Hide
        julianhyde Julian Hyde added a comment -

        I have logged https://github.com/druid-io/druid/issues/3905 for the http-500 caused by the "select distinct" query.

        Show
        julianhyde Julian Hyde added a comment - I have logged https://github.com/druid-io/druid/issues/3905 for the http-500 caused by the "select distinct" query.
        Hide
        gian Gian Merlino added a comment -

        slim bouguerra, https://github.com/druid-io/druid/pull/3850 is a step towards improvement but I still wouldn't want to support a Druid cluster for a customer running in a non UTC server time zone. Fwiw, for Druid's SQL layer I was thinking of having a "connection time zone" concept that is applied just to SQL planning and affects the translation of SQL to Druid queries. Maybe that would work for Hive to Druid too.

        Show
        gian Gian Merlino added a comment - slim bouguerra , https://github.com/druid-io/druid/pull/3850 is a step towards improvement but I still wouldn't want to support a Druid cluster for a customer running in a non UTC server time zone. Fwiw, for Druid's SQL layer I was thinking of having a "connection time zone" concept that is applied just to SQL planning and affects the translation of SQL to Druid queries. Maybe that would work for Hive to Druid too.
        Hide
        jcamachorodriguez Jesus Camacho Rodriguez added a comment -

        Julian Hyde, the Exception you mention (500 error) is reproducible for multiple queries in DruidAdapterIT even without any changes in master. I have not been able to explore the reason yet.

        Show
        jcamachorodriguez Jesus Camacho Rodriguez added a comment - Julian Hyde , the Exception you mention (500 error) is reproducible for multiple queries in DruidAdapterIT even without any changes in master. I have not been able to explore the reason yet.
        Hide
        bslim slim bouguerra added a comment -

        Julian Hyde i am using druid 0.9.2.1 (slightly patched) can you share the exception stack or how to reproduce it ?
        i am sure this query works from me

        hive> select distinct `__time` from druid_table where `__time` < "2016-07-01 00:00:00";
        OK
        2015-01-01 00:00:00
        2016-01-01 00:00:00
        2016-06-01 02:00:00
        Time taken: 0.35 seconds, Fetched: 3 row(s)
        hive> explain select distinct `__time` from druid_table where `__time` < "2016-07-01 00:00:00";
        OK
        Plan optimized by CBO.
        
        Stage-0
          Fetch Operator
            limit:-1
            Select Operator [SEL_1]
              Output:["_col0"]
              TableScan [TS_0]
                Output:["__time"],properties:{"druid.query.json":"{\"queryType\":\"timeseries\",\"dataSource\":\"druid_hive_table\",\"descending\":false,\"granularity\":\"NONE\",\"aggregations\":[],\"intervals\":[\"1900-01-01T00:00:00.000Z/2016-07-01T00:00:00.000Z\"]}","druid.query.type":"timeseries"}
        
        Time taken: 0.252 seconds, Fetched: 10 row(s)
        
        Show
        bslim slim bouguerra added a comment - Julian Hyde i am using druid 0.9.2.1 (slightly patched) can you share the exception stack or how to reproduce it ? i am sure this query works from me hive> select distinct `__time` from druid_table where `__time` < "2016-07-01 00:00:00" ; OK 2015-01-01 00:00:00 2016-01-01 00:00:00 2016-06-01 02:00:00 Time taken: 0.35 seconds, Fetched: 3 row(s) hive> explain select distinct `__time` from druid_table where `__time` < "2016-07-01 00:00:00" ; OK Plan optimized by CBO. Stage-0 Fetch Operator limit:-1 Select Operator [SEL_1] Output:[ "_col0" ] TableScan [TS_0] Output:[ "__time" ],properties:{ "druid.query.json" : "{\" queryType\ ":\" timeseries\ ",\" dataSource\ ":\" druid_hive_table\ ",\" descending\ ": false ,\" granularity\ ":\" NONE\ ",\" aggregations\ ":[],\" intervals\ ":[\" 1900-01-01T00:00:00.000Z/2016-07-01T00:00:00.000Z\ "]}" , "druid.query.type" : "timeseries" } Time taken: 0.252 seconds, Fetched: 10 row(s)
        Hide
        julianhyde Julian Hyde added a comment -

        slim bouguerra, The query gives an http-500 even if I remove the "Z"s. It's nothing to do with time zones, but I hit it trying to reproduce your Hive query. The SQL query is

        select distinct "__time"
        from "wikiticker"
        where "__time" < '2015-10-12 00:00:00'
        

        and if Druid has a problem I guess we should generate a different Druid query for this SQL query.

        Show
        julianhyde Julian Hyde added a comment - slim bouguerra , The query gives an http-500 even if I remove the "Z"s. It's nothing to do with time zones, but I hit it trying to reproduce your Hive query. The SQL query is select distinct "__time" from "wikiticker" where "__time" < '2015-10-12 00:00:00' and if Druid has a problem I guess we should generate a different Druid query for this SQL query.
        Hide
        bslim slim bouguerra added a comment - - edited

        Gian Merlino thought that this PR https://github.com/druid-io/druid/pull/3850 fixes most of the pain points on this list https://github.com/druid-io/druid/issues/2619 ?

        Show
        bslim slim bouguerra added a comment - - edited Gian Merlino thought that this PR https://github.com/druid-io/druid/pull/3850 fixes most of the pain points on this list https://github.com/druid-io/druid/issues/2619 ?
        Hide
        bslim slim bouguerra added a comment -

        Gian Merlino Thanks for the input. But i feel like i would rather fix the issue to make it possible to run in different timezone.
        The reason we are hitting this is the fact that some of the hive data will be stored as timezone less. Then if we feed it to druid running on UTC it will be assumed as UTC time which is wrong (kind of). If a hive cluster is running on EST most likely the data is EST as well even if it is not specified.
        So I thought the best way to handle this is to make sure that druid and hive will be running on the same timezone setting, thus an interval like "1900-01-01T00:00:00/2015-10-12T00:00:00" will get converted to the druid process timezone.
        Hope you get my point ?

        Show
        bslim slim bouguerra added a comment - Gian Merlino Thanks for the input. But i feel like i would rather fix the issue to make it possible to run in different timezone. The reason we are hitting this is the fact that some of the hive data will be stored as timezone less. Then if we feed it to druid running on UTC it will be assumed as UTC time which is wrong (kind of). If a hive cluster is running on EST most likely the data is EST as well even if it is not specified. So I thought the best way to handle this is to make sure that druid and hive will be running on the same timezone setting, thus an interval like "1900-01-01T00:00:00/2015-10-12T00:00:00" will get converted to the druid process timezone. Hope you get my point ?
        Hide
        gian Gian Merlino added a comment -

        Druid does allow time zones to be specified on pretty much anything time related though, so you can still do time zone-y things (like floor to day in any time zone you want). But there isn't really a concept of server default timezone. It's all based on what's specified in each individual query.

        Show
        gian Gian Merlino added a comment - Druid does allow time zones to be specified on pretty much anything time related though, so you can still do time zone-y things (like floor to day in any time zone you want). But there isn't really a concept of server default timezone. It's all based on what's specified in each individual query.
        Hide
        gian Gian Merlino added a comment -

        Fwiw, a properly set up Druid cluster will always have local time UTC, so these two should be equivalent. Running Druid in non-UTC local times isn't supported and various things don't work right. See https://github.com/druid-io/druid/issues/2619 for details.

        Show
        gian Gian Merlino added a comment - Fwiw, a properly set up Druid cluster will always have local time UTC, so these two should be equivalent. Running Druid in non-UTC local times isn't supported and various things don't work right. See https://github.com/druid-io/druid/issues/2619 for details.
        Hide
        bslim slim bouguerra added a comment - - edited

        Julian Hyde which syntax ?
        i still see `"1900-01-01T00:00:00.000Z/2015-10-12T00:00:00.000Z"` on the query thought we are dropping the timezone field. ?
        i expect this to be "1900-01-01T00:00:00/2015-10-12T00:00:00" thought.

        Show
        bslim slim bouguerra added a comment - - edited Julian Hyde which syntax ? i still see `"1900-01-01T00:00:00.000Z/2015-10-12T00:00:00.000Z"` on the query thought we are dropping the timezone field. ? i expect this to be "1900-01-01T00:00:00/2015-10-12T00:00:00" thought.
        Hide
        julianhyde Julian Hyde added a comment -

        slim bouguerra, Druid gives an HTTP-500. Is this syntax only supported in a recent Druid?

        Caused by: java.lang.RuntimeException: Error while processing druid request [{"queryType":"timeseries","dataSource":"wikiticker","descending":false,"granularity":"none","aggregations":[],"intervals":["1900-01-01T00:00:00.000Z/2015-10-12T00:00:00.000Z"]}]
        	at org.apache.calcite.adapter.druid.DruidConnectionImpl.request(DruidConnectionImpl.java:105)
        	at org.apache.calcite.adapter.druid.DruidQuery$DruidQueryNode.run(DruidQuery.java:943)
        	at org.apache.calcite.interpreter.Interpreter.start(Interpreter.java:116)
        	at org.apache.calcite.interpreter.Interpreter.enumerator(Interpreter.java:94)
        	at org.apache.calcite.linq4j.EnumerableDefaults$15$1.<init>(EnumerableDefaults.java:1896)
        	at org.apache.calcite.linq4j.EnumerableDefaults$15.enumerator(EnumerableDefaults.java:1895)
        	at org.apache.calcite.linq4j.EnumerableDefaults$27.enumerator(EnumerableDefaults.java:2515)
        	at org.apache.calcite.linq4j.AbstractEnumerable.iterator(AbstractEnumerable.java:33)
        	at org.apache.calcite.avatica.MetaImpl.createCursor(MetaImpl.java:81)
        	at org.apache.calcite.avatica.AvaticaResultSet.execute(AvaticaResultSet.java:196)
        	at org.apache.calcite.jdbc.CalciteResultSet.execute(CalciteResultSet.java:67)
        	at org.apache.calcite.jdbc.CalciteResultSet.execute(CalciteResultSet.java:44)
        	at org.apache.calcite.avatica.AvaticaConnection$1.execute(AvaticaConnection.java:607)
        	at org.apache.calcite.jdbc.CalciteMetaImpl.prepareAndExecute(CalciteMetaImpl.java:600)
        	at org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:615)
        	at org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:148)
        	... 41 more
        Caused by: java.io.IOException: Server returned HTTP response code: 500 for URL: http://localhost:8082/druid/v2/?pretty
        	at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1840)
        	at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1441)
        
        
        Show
        julianhyde Julian Hyde added a comment - slim bouguerra , Druid gives an HTTP-500. Is this syntax only supported in a recent Druid? Caused by: java.lang.RuntimeException: Error while processing druid request [{"queryType":"timeseries","dataSource":"wikiticker","descending":false,"granularity":"none","aggregations":[],"intervals":["1900-01-01T00:00:00.000Z/2015-10-12T00:00:00.000Z"]}] at org.apache.calcite.adapter.druid.DruidConnectionImpl.request(DruidConnectionImpl.java:105) at org.apache.calcite.adapter.druid.DruidQuery$DruidQueryNode.run(DruidQuery.java:943) at org.apache.calcite.interpreter.Interpreter.start(Interpreter.java:116) at org.apache.calcite.interpreter.Interpreter.enumerator(Interpreter.java:94) at org.apache.calcite.linq4j.EnumerableDefaults$15$1.<init>(EnumerableDefaults.java:1896) at org.apache.calcite.linq4j.EnumerableDefaults$15.enumerator(EnumerableDefaults.java:1895) at org.apache.calcite.linq4j.EnumerableDefaults$27.enumerator(EnumerableDefaults.java:2515) at org.apache.calcite.linq4j.AbstractEnumerable.iterator(AbstractEnumerable.java:33) at org.apache.calcite.avatica.MetaImpl.createCursor(MetaImpl.java:81) at org.apache.calcite.avatica.AvaticaResultSet.execute(AvaticaResultSet.java:196) at org.apache.calcite.jdbc.CalciteResultSet.execute(CalciteResultSet.java:67) at org.apache.calcite.jdbc.CalciteResultSet.execute(CalciteResultSet.java:44) at org.apache.calcite.avatica.AvaticaConnection$1.execute(AvaticaConnection.java:607) at org.apache.calcite.jdbc.CalciteMetaImpl.prepareAndExecute(CalciteMetaImpl.java:600) at org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:615) at org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:148) ... 41 more Caused by: java.io.IOException: Server returned HTTP response code: 500 for URL: http://localhost:8082/druid/v2/?pretty at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1840) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1441)

          People

          • Assignee:
            julianhyde Julian Hyde
            Reporter:
            julianhyde Julian Hyde
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development