Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Incomplete
-
None
-
None
Description
- Case 1 - read.json(rdd)
val rdd = spark.sparkContext.parallelize(1 to 100).map(i => s"""{"a": "str$i"}""") val schema = new StructType().add("a", CalendarIntervalType) spark.read.schema(schema).option("mode", "FAILFAST").json(rdd).show()
should throw an exception before the execution.
- Case 2 - read.json(path
val path = "/tmp/a" val rdd = spark.sparkContext.parallelize(1 to 100).map(i => s"""{"a": "str$i"}""").saveAsTextFile(path) val schema = new StructType().add("a", CalendarIntervalType) spark.read.schema(schema).option("mode", "FAILFAST").json(path).show()
should throw an exception before the execution.
- Case 3 - read.csv(path)
val path = "/tmp/b" val rdd = spark.sparkContext.parallelize(1 to 100).saveAsTextFile(path) val schema = new StructType().add("a", CalendarIntervalType) spark.read.schema(schema).option("mode", "FAILFAST").csv(path).show()
should throw an exception before the execution.
- Case 4 - read.text(path)
val path = "/tmp/c" val rdd = spark.sparkContext.parallelize(1 to 100).saveAsTextFile(path) val schema = new StructType().add("a", LongType) spark.read.schema(schema).text(path).show()
should throw an exception before the execution rather than printing incorrect values.
+-----------+ | a| +-----------+ |68719476738| |68719476738| |68719476738| |68719476738| |68719476738| |68719476738| |68719476738| |68719476738| |68719476738| |68719476738| |68719476738| |68719476738| |68719476739| |68719476738| |68719476738| |68719476738| |68719476738| |68719476738| |68719476738| |68719476738| +-----------+
- Case 5 - from_json
import org.apache.spark.sql.types._ import org.apache.spark.sql.functions._ import spark.implicits._ val df = Seq("""{"a" 1}""").toDS() val schema = new StructType().add("a", CalendarIntervalType) df.select(from_json($"value", schema)).show()
prints
+-------------------+
|jsontostruct(value)|
+-------------------+
| null|
+-------------------+
This should throw analysis exception as CalendarIntervalType is not supported.
Likewise to_json throws an analysis error, for example,
val df = Seq(Tuple1(Tuple1("interval -3 month 7 hours"))).toDF("a") .select(struct($"a._1".cast(CalendarIntervalType).as("a")).as("c")) df.select(to_json($"c")).collect()