[SPARK-18108] Partition discovery fails with explicitly written long partitions - ASF JIRA

Log work

Agile Board

Rank to Top

Rank to Bottom

Attach files

Attach Screenshot

Bulk Copy Attachments

Bulk Move Attachments

Voters

Watch issue

Watchers

Create sub-task

Convert to sub-task

Move

Link

Clone

Labels

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

Delete

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 2.0.1
Fix Version/s: 2.1.1, 2.2.0
Component/s: SQL
Labels:
None

Description

We have parquet data written from Spark1.6 that, when read from 2.0.1, produces errors.

case class A(a: Long, b: Int)
val as = Seq(A(1,2))
//partition explicitly written
spark.createDataFrame(as).write.parquet("/data/a=1/")
spark.read.parquet("/data/").collect

The above code fails; stack trace attached.

If an integer used, explicit partition discovery succeeds.

case class A(a: Int, b: Int)
val as = Seq(A(1,2))
//partition explicitly written
spark.createDataFrame(as).write.parquet("/data/a=1/")
spark.read.parquet("/data/").collect

The action succeeds. Additionally, if 'partitionBy' is used instead of explicit writes, partition discovery succeeds.

Question: Is the first example a reasonable use case? PartitioningUtils seems to default to Integer types unless the partition value exceeds the integer type's length.