Uploaded image for project: 'Phoenix'
  1. Phoenix
  2. PHOENIX-2784

phoenix-spark: Allow coercion of DATE fields to TIMESTAMP when loading DataFrames

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 4.7.0
    • 4.8.0
    • None
    • None
    • Patch

    Description

      The Phoenix DATE type is internally represented as an 8 bytes, which can store a full 'yyyy-MM-dd hh:mm:ss' time component. However, Spark SQL follows the SQL Date spec and keeps only the 'yyyy-MM-dd' portion as a 4 byte type. When loading Phoenix DATE columns using the Spark DataFrame API, the 'hh:mm:ss' component is lost.

      This patch allows setting a new 'dateAsTimestamp' option when loading a DataFrame, which will coerce the underlying Date object to a Timestamp so that the full time component is loaded.

      Attachments

        1. PHOENIX-2784.patch
          7 kB
          Josh Mahonin

        Activity

          jmahonin Josh Mahonin added a comment -

          Anyone want to do a a quick review of this? Ping enis maghamravikiran@gmail.com ndimiduk

          jmahonin Josh Mahonin added a comment - Anyone want to do a a quick review of this? Ping enis maghamravikiran@gmail.com ndimiduk

          jmahonin The patch looks good. +1

          maghamravikiran@gmail.com maghamravikiran added a comment - jmahonin The patch looks good. +1
          jmahonin Josh Mahonin added a comment -

          Thanks maghamravikiran@gmail.com. I'll hold off on committing, as I'm going off the grid tomorrow for a few weeks. I think it's pretty low risk though if anyone else wants to take the reins on it.

          jmahonin Josh Mahonin added a comment - Thanks maghamravikiran@gmail.com . I'll hold off on committing, as I'm going off the grid tomorrow for a few weeks. I think it's pretty low risk though if anyone else wants to take the reins on it.
          ndimiduk Nick Dimiduk added a comment -

          I'm not following the intended use-case here. Date, Time, and Timestamp are all different types with different precisions. Instead of overriding the meaning through configuration, the type information in the schema should be preserved throughout. If you want Timestamp, use it in your schema.

          ndimiduk Nick Dimiduk added a comment - I'm not following the intended use-case here. Date, Time, and Timestamp are all different types with different precisions. Instead of overriding the meaning through configuration, the type information in the schema should be preserved throughout. If you want Timestamp, use it in your schema.

          ndimiduk - in JDBC the Timestamp type is derived from the Date type. Hence it's fine to all a Date to be used where a Timestamp is, you'll just have millisecond precision. We encourage Phoenix users to use Date instead of Timestamp because it performs much better and 99% of the time you don't need nano precision (which is what Timestamp gives you above and beyond what you get from Date).

          jamestaylor James R. Taylor added a comment - ndimiduk - in JDBC the Timestamp type is derived from the Date type. Hence it's fine to all a Date to be used where a Timestamp is, you'll just have millisecond precision. We encourage Phoenix users to use Date instead of Timestamp because it performs much better and 99% of the time you don't need nano precision (which is what Timestamp gives you above and beyond what you get from Date).

          Unless there are further objections, how about getting this committed, jmahonin?

          jamestaylor James R. Taylor added a comment - Unless there are further objections, how about getting this committed, jmahonin ?
          hudson Hudson added a comment -

          SUCCESS: Integrated in Phoenix-master #1224 (See https://builds.apache.org/job/Phoenix-master/1224/)
          PHOENIX-2784 phoenix-spark: Allow coercion of DATE to TIMESTAMP for (jmahonin: rev 98e783cf6d7b4644660f48961795699d5374ce3f)

          • phoenix-spark/src/it/scala/org/apache/phoenix/spark/PhoenixSparkIT.scala
          • phoenix-spark/src/main/scala/org/apache/phoenix/spark/DefaultSource.scala
          • phoenix-spark/src/main/scala/org/apache/phoenix/spark/PhoenixRelation.scala
          • phoenix-spark/src/main/scala/org/apache/phoenix/spark/PhoenixRDD.scala
          hudson Hudson added a comment - SUCCESS: Integrated in Phoenix-master #1224 (See https://builds.apache.org/job/Phoenix-master/1224/ ) PHOENIX-2784 phoenix-spark: Allow coercion of DATE to TIMESTAMP for (jmahonin: rev 98e783cf6d7b4644660f48961795699d5374ce3f) phoenix-spark/src/it/scala/org/apache/phoenix/spark/PhoenixSparkIT.scala phoenix-spark/src/main/scala/org/apache/phoenix/spark/DefaultSource.scala phoenix-spark/src/main/scala/org/apache/phoenix/spark/PhoenixRelation.scala phoenix-spark/src/main/scala/org/apache/phoenix/spark/PhoenixRDD.scala

          People

            jmahonin Josh Mahonin
            jmahonin Josh Mahonin
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: