[SPARK-18246] Throws an exception before execution for unsupported types in Json, CSV and text functionailities - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Incomplete
Affects Version/s: None
Fix Version/s: None
Component/s: SQL
Labels:
- bulk-closed

Description

Case 1 - read.json(rdd)

val rdd = spark.sparkContext.parallelize(1 to 100).map(i => s"""{"a": "str$i"}""")
val schema = new StructType().add("a", CalendarIntervalType)
spark.read.schema(schema).option("mode", "FAILFAST").json(rdd).show()

should throw an exception before the execution.

Case 2 - read.json(path

val path = "/tmp/a"
val rdd = spark.sparkContext.parallelize(1 to 100).map(i => s"""{"a": "str$i"}""").saveAsTextFile(path)
val schema = new StructType().add("a", CalendarIntervalType)
spark.read.schema(schema).option("mode", "FAILFAST").json(path).show()

should throw an exception before the execution.

Case 3 - read.csv(path)

val path = "/tmp/b"
val rdd = spark.sparkContext.parallelize(1 to 100).saveAsTextFile(path)
val schema = new StructType().add("a", CalendarIntervalType)
spark.read.schema(schema).option("mode", "FAILFAST").csv(path).show()

should throw an exception before the execution.

Case 4 - read.text(path)

val path = "/tmp/c"
val rdd = spark.sparkContext.parallelize(1 to 100).saveAsTextFile(path)
val schema = new StructType().add("a", LongType)
spark.read.schema(schema).text(path).show()

should throw an exception before the execution rather than printing incorrect values.

+-----------+
|          a|
+-----------+
|68719476738|
|68719476738|
|68719476738|
|68719476738|
|68719476738|
|68719476738|
|68719476738|
|68719476738|
|68719476738|
|68719476738|
|68719476738|
|68719476738|
|68719476739|
|68719476738|
|68719476738|
|68719476738|
|68719476738|
|68719476738|
|68719476738|
|68719476738|
+-----------+

Case 5 - from_json

import org.apache.spark.sql.types._
import org.apache.spark.sql.functions._
import spark.implicits._

val df = Seq("""{"a" 1}""").toDS()
val schema = new StructType().add("a", CalendarIntervalType)
df.select(from_json($"value", schema)).show()

prints

+-------------------+
|jsontostruct(value)|
+-------------------+
|               null|
+-------------------+

This should throw analysis exception as CalendarIntervalType is not supported.

Likewise to_json throws an analysis error, for example,

val df = Seq(Tuple1(Tuple1("interval -3 month 7 hours"))).toDF("a")
  .select(struct($"a._1".cast(CalendarIntervalType).as("a")).as("c"))
df.select(to_json($"c")).collect()

Attachments

Issue Links

links to

[Github] Pull Request #15751 (HyukjinKwon)

Activity

People

Assignee:: Unassigned

Reporter:: Hyukjin Kwon

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 03/Nov/16 09:45

Updated:: 12/Dec/22 17:51

Resolved:: 21/May/19 04:36