Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-12744

Inconsistent behavior parsing JSON with unix timestamp values

    Details

    • Target Version/s:

      Description

      Let’s have following json

      val rdd = sc.parallelize("""{"ts":1452386229}""" :: Nil)
      

      Spark sql casts int to timestamp treating int value as a number of seconds.
      https://issues.apache.org/jira/browse/SPARK-11724

      scala> sqlContext.read.json(rdd).select($"ts".cast(TimestampType)).show
      +--------------------+
      |                  ts|
      +--------------------+
      |2016-01-10 01:37:...|
      +--------------------+
      

      However parsing json with schema gives different result

      scala> val schema = (new StructType).add("ts", TimestampType)
      schema: org.apache.spark.sql.types.StructType = StructType(StructField(ts,TimestampType,true))
      
      scala> sqlContext.read.schema(schema).json(rdd).show
      +--------------------+
      |                  ts|
      +--------------------+
      |1970-01-17 20:26:...|
      +--------------------+
      

        Attachments

          Activity

            People

            • Assignee:
              antlypls Anatoliy Plastinin
              Reporter:
              antlypls Anatoliy Plastinin
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: