[SPARK-18108] Partition discovery fails with explicitly written long partitions - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 2.0.1
Fix Version/s: 2.1.1, 2.2.0
Component/s: SQL
Labels:
None

Description

We have parquet data written from Spark1.6 that, when read from 2.0.1, produces errors.

case class A(a: Long, b: Int)
val as = Seq(A(1,2))
//partition explicitly written
spark.createDataFrame(as).write.parquet("/data/a=1/")
spark.read.parquet("/data/").collect

The above code fails; stack trace attached.

If an integer used, explicit partition discovery succeeds.

case class A(a: Int, b: Int)
val as = Seq(A(1,2))
//partition explicitly written
spark.createDataFrame(as).write.parquet("/data/a=1/")
spark.read.parquet("/data/").collect

The action succeeds. Additionally, if 'partitionBy' is used instead of explicit writes, partition discovery succeeds.

Question: Is the first example a reasonable use case? PartitioningUtils seems to default to Integer types unless the partition value exceeds the integer type's length.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

stacktrace.out
26/Oct/16 02:28
9 kB
Richard Moorhead

Issue Links

links to

[Github] Pull Request #16030 (maropu)

Activity

People

Assignee:: Takeshi Yamamuro

Reporter:: Richard Moorhead

Votes:: 4 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 26/Oct/16 02:28

Updated:: 16/Dec/16 14:48

Resolved:: 16/Dec/16 14:48