Description
We have parquet data written from Spark1.6 that, when read from 2.0.1, produces errors.
case class A(a: Long, b: Int) val as = Seq(A(1,2)) //partition explicitly written spark.createDataFrame(as).write.parquet("/data/a=1/") spark.read.parquet("/data/").collect
The above code fails; stack trace attached.
If an integer used, explicit partition discovery succeeds.
case class A(a: Int, b: Int) val as = Seq(A(1,2)) //partition explicitly written spark.createDataFrame(as).write.parquet("/data/a=1/") spark.read.parquet("/data/").collect
The action succeeds. Additionally, if 'partitionBy' is used instead of explicit writes, partition discovery succeeds.
Question: Is the first example a reasonable use case? PartitioningUtils seems to default to Integer types unless the partition value exceeds the integer type's length.