Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-12744

Inconsistent behavior parsing JSON with unix timestamp values

    Details

    • Target Version/s:

      Description

      Let’s have following json

      val rdd = sc.parallelize("""{"ts":1452386229}""" :: Nil)
      

      Spark sql casts int to timestamp treating int value as a number of seconds.
      https://issues.apache.org/jira/browse/SPARK-11724

      scala> sqlContext.read.json(rdd).select($"ts".cast(TimestampType)).show
      +--------------------+
      |                  ts|
      +--------------------+
      |2016-01-10 01:37:...|
      +--------------------+
      

      However parsing json with schema gives different result

      scala> val schema = (new StructType).add("ts", TimestampType)
      schema: org.apache.spark.sql.types.StructType = StructType(StructField(ts,TimestampType,true))
      
      scala> sqlContext.read.schema(schema).json(rdd).show
      +--------------------+
      |                  ts|
      +--------------------+
      |1970-01-17 20:26:...|
      +--------------------+
      

        Issue Links

          Activity

          Hide
          apachespark Apache Spark added a comment -

          User 'antlypls' has created a pull request for this issue:
          https://github.com/apache/spark/pull/10687

          Show
          apachespark Apache Spark added a comment - User 'antlypls' has created a pull request for this issue: https://github.com/apache/spark/pull/10687
          Hide
          yhuai Yin Huai added a comment -

          This issue has been resolved by https://github.com/apache/spark/pull/10687.

          Show
          yhuai Yin Huai added a comment - This issue has been resolved by https://github.com/apache/spark/pull/10687 .
          Hide
          yhuai Yin Huai added a comment -

          Anatoliy Plastinin Can you add a comment to summarize the change (it will help us to prepare the release notes)?

          Show
          yhuai Yin Huai added a comment - Anatoliy Plastinin Can you add a comment to summarize the change (it will help us to prepare the release notes)?
          Hide
          antlypls Anatoliy Plastinin added a comment -

          Yin Huai How about: "The semantics of reading JSON integer as timestamp (explicitly defined by schema) has been changed, the integer value is treated as number of seconds instead of milliseconds"?

          Show
          antlypls Anatoliy Plastinin added a comment - Yin Huai How about: "The semantics of reading JSON integer as timestamp (explicitly defined by schema) has been changed, the integer value is treated as number of seconds instead of milliseconds" ?
          Hide
          yhuai Yin Huai added a comment -

          Yea, that is great! Thank you!

          Show
          yhuai Yin Huai added a comment - Yea, that is great! Thank you!

            People

            • Assignee:
              antlypls Anatoliy Plastinin
              Reporter:
              antlypls Anatoliy Plastinin
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development