[SPARK-13101] Dataset complex types mapping to DataFrame (element nullability) mismatch - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Blocker
Resolution: Fixed
Affects Version/s: 1.6.1
Fix Version/s: 1.6.1, 2.0.0
Component/s: SQL
Labels:
None

Target Version/s:

1.6.1, 2.0.0

Description

There seems to be a regression between 1.6.0 and 1.6.1 (snapshot build). By default a scala Seq[Double] is mapped by Spark as an ArrayType with nullable element

 |-- valuations: array (nullable = true)
 |    |-- element: double (containsNull = true)

This could be read back to as a Dataset in Spark 1.6.0

    val df = sqlContext.table("valuations").as[Valuation]

But with Spark 1.6.1 the same fails with

    val df = sqlContext.table("valuations").as[Valuation]

org.apache.spark.sql.AnalysisException: cannot resolve 'cast(valuations as array<double>)' due to data type mismatch: cannot cast ArrayType(DoubleType,true) to ArrayType(DoubleType,false);

Here's the classes I am using

case class Valuation(tradeId : String,
                     counterparty: String,
                     nettingAgreement: String,
                     wrongWay: Boolean,
                     valuations : Seq[Double], /* one per scenario */
                     timeInterval: Int,
                     jobId: String)  /* used for hdfs partitioning */

val vals : Seq[Valuation] = Seq()
val valsDF = sqlContext.sparkContext.parallelize(vals).toDF
valsDF.write.partitionBy("jobId").mode(SaveMode.Overwrite).saveAsTable("valuations")

even the following gives the same result

val valsDF = vals.toDS.toDF

Attachments

Issue Links

links to

[Github] Pull Request #11035 (cloud-fan)

[Github] Pull Request #11042 (cloud-fan)

Activity

People

Assignee:: Wenchen Fan

Reporter:: Deenar Toraskar

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 30/Jan/16 08:58

Updated:: 08/Feb/16 20:06

Resolved:: 08/Feb/16 20:06