Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-18246

Throws an exception before execution for unsupported types in Json, CSV and text functionailities

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • None
    • None
    • SQL

    Description

      • Case 1 - read.json(rdd)
      val rdd = spark.sparkContext.parallelize(1 to 100).map(i => s"""{"a": "str$i"}""")
      val schema = new StructType().add("a", CalendarIntervalType)
      spark.read.schema(schema).option("mode", "FAILFAST").json(rdd).show()
      

      should throw an exception before the execution.

      • Case 2 - read.json(path
      val path = "/tmp/a"
      val rdd = spark.sparkContext.parallelize(1 to 100).map(i => s"""{"a": "str$i"}""").saveAsTextFile(path)
      val schema = new StructType().add("a", CalendarIntervalType)
      spark.read.schema(schema).option("mode", "FAILFAST").json(path).show()
      

      should throw an exception before the execution.

      • Case 3 - read.csv(path)
      val path = "/tmp/b"
      val rdd = spark.sparkContext.parallelize(1 to 100).saveAsTextFile(path)
      val schema = new StructType().add("a", CalendarIntervalType)
      spark.read.schema(schema).option("mode", "FAILFAST").csv(path).show()
      

      should throw an exception before the execution.

      • Case 4 - read.text(path)
      val path = "/tmp/c"
      val rdd = spark.sparkContext.parallelize(1 to 100).saveAsTextFile(path)
      val schema = new StructType().add("a", LongType)
      spark.read.schema(schema).text(path).show()
      

      should throw an exception before the execution rather than printing incorrect values.

      +-----------+
      |          a|
      +-----------+
      |68719476738|
      |68719476738|
      |68719476738|
      |68719476738|
      |68719476738|
      |68719476738|
      |68719476738|
      |68719476738|
      |68719476738|
      |68719476738|
      |68719476738|
      |68719476738|
      |68719476739|
      |68719476738|
      |68719476738|
      |68719476738|
      |68719476738|
      |68719476738|
      |68719476738|
      |68719476738|
      +-----------+
      
      • Case 5 - from_json
      import org.apache.spark.sql.types._
      import org.apache.spark.sql.functions._
      import spark.implicits._
      
      val df = Seq("""{"a" 1}""").toDS()
      val schema = new StructType().add("a", CalendarIntervalType)
      df.select(from_json($"value", schema)).show()
      

      prints

      +-------------------+
      |jsontostruct(value)|
      +-------------------+
      |               null|
      +-------------------+
      

      This should throw analysis exception as CalendarIntervalType is not supported.

      Likewise to_json throws an analysis error, for example,

      val df = Seq(Tuple1(Tuple1("interval -3 month 7 hours"))).toDF("a")
        .select(struct($"a._1".cast(CalendarIntervalType).as("a")).as("c"))
      df.select(to_json($"c")).collect()
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            gurwls223 Hyukjin Kwon
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: