Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-1947

Add time/timestamp with local time zone types to optimizer

    Details

    • Type: New Feature
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.14.0
    • Component/s: core
    • Labels:
      None

      Description

      Implementation would be similar to PostgreSQL (the time/timestamp with timezone types do not store the timezone) and conversion from/to timezone-less types follows similar semantics.

      https://www.postgresql.org/docs/9.6/static/functions-datetime.html

      This would also allow us to integrate easily with Hive and Druid, which follow similar storage models/semantics for timestamp with timezone.

      Follow-up work will be needed to introduce these new types in Avatica and extend Calcite SQL parser.

        Issue Links

          Activity

          Hide
          michaelmior Michael Mior added a comment -

          Resolved in release 1.14.0 (2017-10-01)

          Show
          michaelmior Michael Mior added a comment - Resolved in release 1.14.0 (2017-10-01)
          Hide
          julianhyde Julian Hyde added a comment -
          Show
          julianhyde Julian Hyde added a comment - A couple more fixes in http://git-wip-us.apache.org/repos/asf/calcite/commit/67071b6b .
          Show
          jcamachorodriguez Jesus Camacho Rodriguez added a comment - Fixed javadoc in http://git-wip-us.apache.org/repos/asf/calcite/commit/172db20 .
          Show
          jcamachorodriguez Jesus Camacho Rodriguez added a comment - - edited Fixed in http://git-wip-us.apache.org/repos/asf/calcite/commit/939c9a6 .
          Hide
          jcamachorodriguez Jesus Camacho Rodriguez added a comment -

          Julian Hyde, thanks for the review.

          I renamed those variables. I also removed the hardcoded variables from the Schemas.java and added a reference to the new time in reference.md.

          I am going to run the Druid testsuite with latest changes right now and I will push it shortly.

          Show
          jcamachorodriguez Jesus Camacho Rodriguez added a comment - Julian Hyde , thanks for the review. I renamed those variables. I also removed the hardcoded variables from the Schemas.java and added a reference to the new time in reference.md . I am going to run the Druid testsuite with latest changes right now and I will push it shortly.
          Hide
          julianhyde Julian Hyde added a comment -

          +1 when you've made the (mainly cosmetic) fixes below.

          I was a bit confused until I realized that the TimestampWithTimeZone and TimeWithTimeZone classes are basically just for future use.

          The SQL type is "TIMESTAMP WITH LOCAL TIME ZONE" not "TIMESTAMP WITH LOCAL TIMEZONE". Can you rename SqlTypeName.TIMESTAMP_WITH_LOCAL_TIMEZONE accordingly. Similarly TIME WITH LOCAL TIME ZONE. Also search for "timezone" in comments, and change BuiltInMethod constants that contain "TIMEZONE".

          In Schemas.java you changed TIME_ZONE and CURRENT_TIMESTAMP to hard-coded constants. A mistake, I think.

          Can you add entries to the table in reference.md.

          Show
          julianhyde Julian Hyde added a comment - +1 when you've made the (mainly cosmetic) fixes below. I was a bit confused until I realized that the TimestampWithTimeZone and TimeWithTimeZone classes are basically just for future use. The SQL type is "TIMESTAMP WITH LOCAL TIME ZONE" not "TIMESTAMP WITH LOCAL TIMEZONE". Can you rename SqlTypeName.TIMESTAMP_WITH_LOCAL_TIMEZONE accordingly. Similarly TIME WITH LOCAL TIME ZONE. Also search for "timezone" in comments, and change BuiltInMethod constants that contain "TIMEZONE". In Schemas.java you changed TIME_ZONE and CURRENT_TIMESTAMP to hard-coded constants. A mistake, I think. Can you add entries to the table in reference.md.
          Hide
          julianhyde Julian Hyde added a comment -

          I will try to review this tonight so that it can get into 1.14.

          Show
          julianhyde Julian Hyde added a comment - I will try to review this tonight so that it can get into 1.14.
          Hide
          jcamachorodriguez Jesus Camacho Rodriguez added a comment - - edited

          As we discussed previously, the semantics of "timestamp with local time zone" type is not only similar to Hive semantics, but also to Postgres "timestamp with time zone" type and Oracle "timestamp with local time zone". It is indeed an instant, but that instant is interpreted in the local time zone (which is not necessarily the JVM time zone but rather the property value for the local time zone).

          One solution is to rename class TimestampWithLocalTimeZoneString to TimestampWithTimeZoneString. Because you have implemented a literal for TIMESTAMP WITH TIME ZONE. It's confusing to pretend otherwise.

          That sounds good to me. In fact, we do indeed call the Hive literal timestamp with time zone. We could use that literal to store "timestamp with time zone" when we have an implementation in the future.

          I have pushed a new commit to the PR that I think will address your concerns: it represents internally the time/timestamp with local time zone types as TimeString/TimestampString, and then relies on the logic in TimeWithTimeZone/TimestampWithTimeZoneString when it needs to make any transformation. Initially I thought that doing that would be more elaborated, but then I realized I had all the information I needed when we need to do the interpretation of the literal, e.g., for constant reduction. Please, let me know what you think. Thanks!

          Show
          jcamachorodriguez Jesus Camacho Rodriguez added a comment - - edited As we discussed previously, the semantics of "timestamp with local time zone" type is not only similar to Hive semantics, but also to Postgres "timestamp with time zone" type and Oracle "timestamp with local time zone". It is indeed an instant, but that instant is interpreted in the local time zone (which is not necessarily the JVM time zone but rather the property value for the local time zone). One solution is to rename class TimestampWithLocalTimeZoneString to TimestampWithTimeZoneString. Because you have implemented a literal for TIMESTAMP WITH TIME ZONE. It's confusing to pretend otherwise. That sounds good to me. In fact, we do indeed call the Hive literal timestamp with time zone. We could use that literal to store "timestamp with time zone" when we have an implementation in the future. I have pushed a new commit to the PR that I think will address your concerns: it represents internally the time/timestamp with local time zone types as TimeString/TimestampString, and then relies on the logic in TimeWithTimeZone/TimestampWithTimeZoneString when it needs to make any transformation. Initially I thought that doing that would be more elaborated, but then I realized I had all the information I needed when we need to do the interpretation of the literal, e.g., for constant reduction. Please, let me know what you think. Thanks!
          Hide
          julianhyde Julian Hyde added a comment -

          I'm still really confused by the semantics of this.

          One solution is to rename class TimestampWithLocalTimeZoneString to TimestampWithTimeZoneString. Because you have implemented a literal for TIMESTAMP WITH TIME ZONE. It's confusing to pretend otherwise.

          Now, if you need to know the time zone during simplification, that's fine, but put that time zone in the simplifier, not the literal.

          I know you're trying to produce something that models Hive's semantics. I find Hive's semantics confusing, but my understanding was that they were just like java.util.Date: an instant, whose value is stored relative to epoch UTC, and which does not contain a time zone (albeit it uses JVM's time zone if you are foolish enough to call java.util.Date.toString). We ran with that concept, and we needed a name for the SQL type, and we decided to go with TIMESTAMP WITH LOCAL TIME ZONE.

          Maybe the name of that SQL type is confusing, and people expect there to be a time zone stored inside it. If that's the case, let's find a better name for the SQL type.

          Show
          julianhyde Julian Hyde added a comment - I'm still really confused by the semantics of this. One solution is to rename class TimestampWithLocalTimeZoneString to TimestampWithTimeZoneString. Because you have implemented a literal for TIMESTAMP WITH TIME ZONE. It's confusing to pretend otherwise. Now, if you need to know the time zone during simplification, that's fine, but put that time zone in the simplifier, not the literal. I know you're trying to produce something that models Hive's semantics. I find Hive's semantics confusing, but my understanding was that they were just like java.util.Date: an instant, whose value is stored relative to epoch UTC, and which does not contain a time zone (albeit it uses JVM's time zone if you are foolish enough to call java.util.Date.toString). We ran with that concept, and we needed a name for the SQL type, and we decided to go with TIMESTAMP WITH LOCAL TIME ZONE. Maybe the name of that SQL type is confusing, and people expect there to be a time zone stored inside it. If that's the case, let's find a better name for the SQL type.
          Hide
          jcamachorodriguez Jesus Camacho Rodriguez added a comment -

          Julian Hyde, I have uploaded a fix for the compareTo method.

          Show
          jcamachorodriguez Jesus Camacho Rodriguez added a comment - Julian Hyde , I have uploaded a fix for the compareTo method.
          Hide
          jcamachorodriguez Jesus Camacho Rodriguez added a comment - - edited

          Thanks for the feedback Julian Hyde.

          We are on shaky ground if the literal has more information content than the runtime value.

          Statically we have information about the timezone that we can use to perform some optimizations, e.g., constant reduction. I decided to include that information within the value because I was facing issues to retrieve the timezone information when doing optimizations on RexNode, since all the related utilities are executed without context information. Basically what I am doing is passing that context information when I am creating the RexNode value, rather than passing context information to code that should be context independent (except for this bit). Do you have another idea on how to architect this part?

          Also, if I create TimestampWithLocalTimeZoneString("1970-01-01 08:00:00", "PST") and TimestampWithLocalTimeZoneString("1970-01-01 05:00:00", "EST") these values are equal, and have identical values in their internal fields. Their value is the epoch - either 0L or "1970-01-01 00:00:00", I don't care how you represent it internally.

          I agree, I will change the compareTo method to normalize both values to UTC, that would be give the correct behavior.

          Show
          jcamachorodriguez Jesus Camacho Rodriguez added a comment - - edited Thanks for the feedback Julian Hyde . We are on shaky ground if the literal has more information content than the runtime value. Statically we have information about the timezone that we can use to perform some optimizations, e.g., constant reduction. I decided to include that information within the value because I was facing issues to retrieve the timezone information when doing optimizations on RexNode, since all the related utilities are executed without context information. Basically what I am doing is passing that context information when I am creating the RexNode value, rather than passing context information to code that should be context independent (except for this bit). Do you have another idea on how to architect this part? Also, if I create TimestampWithLocalTimeZoneString("1970-01-01 08:00:00", "PST") and TimestampWithLocalTimeZoneString("1970-01-01 05:00:00", "EST") these values are equal, and have identical values in their internal fields. Their value is the epoch - either 0L or "1970-01-01 00:00:00", I don't care how you represent it internally. I agree, I will change the compareTo method to normalize both values to UTC, that would be give the correct behavior.
          Hide
          julianhyde Julian Hyde added a comment -

          The information content of a TIMESTAMP WITH LOCAL TIME ZONE value is just a 64 bit integer, no time zone. So it doesn't seem right that the TimeWithLocalTimeZoneString has a time zone inside it. We are on shaky ground if the literal has more information content than the runtime value.

          This change adds a new data type, and we ought to nail down the semantics, which would include tests of various queries and also tests for JDBC driver support. I think it is unreasonable for me to expect all of that, but the time zone field in TimeWithLocalTimeZoneString needs to go, otherwise we will be tempted to give this the same semantics as TIMESTAMP WITH TIME ZONE, which it emphatically does not.

          Can you also update reference.md with the new data types.

          In Hive (and the semantics I assume you are trying to achieve here), is the time zone offset applied when the query is parsed, or when it is executed? I would assume parse time. If so, if I, in PST, parse a query with TIMESTAMP WITH LOCAL TIME ZONE '1970-01-01 08:00:00', I would expect it to create a TimestampWithLocalTimeZone(0L).

          Also, if I create TimestampWithLocalTimeZoneString("1970-01-01 08:00:00", "PST") and TimestampWithLocalTimeZoneString("1970-01-01 05:00:00", "EST") these values are equal, and have identical values in their internal fields. Their value is the epoch - either 0L or "1970-01-01 00:00:00", I don't care how you represent it internally.

          Show
          julianhyde Julian Hyde added a comment - The information content of a TIMESTAMP WITH LOCAL TIME ZONE value is just a 64 bit integer, no time zone. So it doesn't seem right that the TimeWithLocalTimeZoneString has a time zone inside it. We are on shaky ground if the literal has more information content than the runtime value. This change adds a new data type, and we ought to nail down the semantics, which would include tests of various queries and also tests for JDBC driver support. I think it is unreasonable for me to expect all of that, but the time zone field in TimeWithLocalTimeZoneString needs to go, otherwise we will be tempted to give this the same semantics as TIMESTAMP WITH TIME ZONE, which it emphatically does not. Can you also update reference.md with the new data types. In Hive (and the semantics I assume you are trying to achieve here), is the time zone offset applied when the query is parsed, or when it is executed? I would assume parse time. If so, if I, in PST, parse a query with TIMESTAMP WITH LOCAL TIME ZONE '1970-01-01 08:00:00' , I would expect it to create a TimestampWithLocalTimeZone(0L). Also, if I create TimestampWithLocalTimeZoneString("1970-01-01 08:00:00", "PST") and TimestampWithLocalTimeZoneString("1970-01-01 05:00:00", "EST") these values are equal, and have identical values in their internal fields. Their value is the epoch - either 0L or "1970-01-01 00:00:00", I don't care how you represent it internally.
          Hide
          jcamachorodriguez Jesus Camacho Rodriguez added a comment -

          Julian Hyde, I have updated the PR to refer to TIME/TIMESTAMP WITH LOCAL TIME ZONE and done some refactoring to remove some duplicate code.

          Show
          jcamachorodriguez Jesus Camacho Rodriguez added a comment - Julian Hyde , I have updated the PR to refer to TIME/TIMESTAMP WITH LOCAL TIME ZONE and done some refactoring to remove some duplicate code.
          Hide
          jcamachorodriguez Jesus Camacho Rodriguez added a comment -

          Thanks for the quick feedback Julian Hyde.

          Indeed this implementation is not consistent with the standard. However, TIMESTAMP WITH LOCAL TIME ZONE seems to have similar semantics to Postgres indeed, thus it seems a proper name for the type. I will check it into more detail and change the name accordingly.

          I will also refactor TimestampString / TimestampWithTimeZoneString and update the PR.

          Show
          jcamachorodriguez Jesus Camacho Rodriguez added a comment - Thanks for the quick feedback Julian Hyde . Indeed this implementation is not consistent with the standard. However, TIMESTAMP WITH LOCAL TIME ZONE seems to have similar semantics to Postgres indeed, thus it seems a proper name for the type. I will check it into more detail and change the name accordingly. I will also refactor TimestampString / TimestampWithTimeZoneString and update the PR.
          Hide
          julianhyde Julian Hyde added a comment -

          I don't think that TIMESTAMP WITH TIME ZONE data type is consistent with that in the SQL standard. The standard type requires storing an extra 2 bytes of information. So, I think we need to find another name for it.

          PostgreSQL has a TIMESTAMP WITH TIME ZONE type similar to the one you propose, but they're making the same mistake.

          It seems better to follow what Oracle does. Your type is somewhat similar to Oracle's TIMESTAMP WITH LOCAL TIME ZONE, except that Oracle converts such timestamps to the database's local time zone, whereas this type would convert to UTC. Oracle also has a TIMESTAMP WITH TIME ZONE type that has the requisite extra 2 bytes and seems to be compliant with the standard.

          If you think it is similar enough, we could call this TIMESTAMP WITH LOCAL TIME ZONE. Can you research what other databases do?

          Also, could you save some copy-paste by creating a common (package-protected, abstract) base class for TimestampWithTimeZoneString and TimestampString?

          Show
          julianhyde Julian Hyde added a comment - I don't think that TIMESTAMP WITH TIME ZONE data type is consistent with that in the SQL standard. The standard type requires storing an extra 2 bytes of information. So, I think we need to find another name for it. PostgreSQL has a TIMESTAMP WITH TIME ZONE type similar to the one you propose, but they're making the same mistake. It seems better to follow what Oracle does. Your type is somewhat similar to Oracle's TIMESTAMP WITH LOCAL TIME ZONE , except that Oracle converts such timestamps to the database's local time zone, whereas this type would convert to UTC. Oracle also has a TIMESTAMP WITH TIME ZONE type that has the requisite extra 2 bytes and seems to be compliant with the standard. If you think it is similar enough, we could call this TIMESTAMP WITH LOCAL TIME ZONE . Can you research what other databases do? Also, could you save some copy-paste by creating a common (package-protected, abstract) base class for TimestampWithTimeZoneString and TimestampString ?
          Hide
          jcamachorodriguez Jesus Camacho Rodriguez added a comment -

          Julian Hyde, could you review this one?
          https://github.com/apache/calcite/pull/519

          I would like to make sure I have touched all the right parts and I did not miss anything. We will need to extend Avatica in a follow-up, thus I changed the DruidAdapterIT tests that were reading the 'timestamp with timezone' column accordingly with a cast (at least for the time being). Once Avatica is extended and version consumed by Calcite bumped up, this restriction does not need to exist anymore.
          Another pending item is the extension of the SQL parser, etc.
          However, this patch should be enough to allow systems such as Hive consume / use the timestamp with timezone type.

          Show
          jcamachorodriguez Jesus Camacho Rodriguez added a comment - Julian Hyde , could you review this one? https://github.com/apache/calcite/pull/519 I would like to make sure I have touched all the right parts and I did not miss anything. We will need to extend Avatica in a follow-up, thus I changed the DruidAdapterIT tests that were reading the 'timestamp with timezone' column accordingly with a cast (at least for the time being). Once Avatica is extended and version consumed by Calcite bumped up, this restriction does not need to exist anymore. Another pending item is the extension of the SQL parser, etc. However, this patch should be enough to allow systems such as Hive consume / use the timestamp with timezone type.

            People

            • Assignee:
              jcamachorodriguez Jesus Camacho Rodriguez
              Reporter:
              jcamachorodriguez Jesus Camacho Rodriguez
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development