[ARROW-11409] [Integration] Enable Arrow to read Parquet files from Spark 2.x with illegal nulls - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Minor
Resolution: Unresolved
Affects Version/s: 3.0.0
Fix Version/s: None
Component/s: Integration
Labels:
None

External issue URL:
https://github.com/apache/arrow/issues/27298

Description

While running integration tests with Arrow and Spark, I observed that Spark 2.x can in some circumstances write Parquet files with illegal nulls in non-nullable columns. (This appears to have been fixed in Spark 3.0.) Arrow throws an Unexpected end of stream error when attempting to read illegal Parquet files like this.

The attached Parquet file written by Spark 2.0.0 can be used to repro this behavior. It contains only one column, a non-nullable integer named x, with three records:

+-----+
|    x|
+-----+
|    1|
| null|
|    3|
+-----+

This issue is for awareness only. I expect this should be closed as "won't fix".

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

spark_2.0.0_illegal_null.parquet
27/Jan/21 22:26
0.3 kB
Ian Cook

Activity

People

Assignee:: Unassigned

Reporter:: Ian Cook

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 27/Jan/21 22:12

Updated:: 11/Jan/23 08:19