Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-12744

Inconsistent behavior parsing JSON with unix timestamp values

    XMLWordPrintableJSON

Details

    Description

      Let’s have following json

      val rdd = sc.parallelize("""{"ts":1452386229}""" :: Nil)
      

      Spark sql casts int to timestamp treating int value as a number of seconds.
      https://issues.apache.org/jira/browse/SPARK-11724

      scala> sqlContext.read.json(rdd).select($"ts".cast(TimestampType)).show
      +--------------------+
      |                  ts|
      +--------------------+
      |2016-01-10 01:37:...|
      +--------------------+
      

      However parsing json with schema gives different result

      scala> val schema = (new StructType).add("ts", TimestampType)
      schema: org.apache.spark.sql.types.StructType = StructType(StructField(ts,TimestampType,true))
      
      scala> sqlContext.read.schema(schema).json(rdd).show
      +--------------------+
      |                  ts|
      +--------------------+
      |1970-01-17 20:26:...|
      +--------------------+
      

      Attachments

        Activity

          People

            antlypls Anatoliy Plastinin
            antlypls Anatoliy Plastinin
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: