Details
-
Bug
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
3.0.0
-
None
-
None
Description
While running integration tests with Arrow and Spark, I observed that Spark 2.x can in some circumstances write Parquet files with illegal nulls in non-nullable columns. (This appears to have been fixed in Spark 3.0.) Arrow throws an Unexpected end of stream error when attempting to read illegal Parquet files like this.
The attached Parquet file written by Spark 2.0.0 can be used to repro this behavior. It contains only one column, a non-nullable integer named x, with three records:
+-----+
| x|
+-----+
| 1|
| null|
| 3|
+-----+
This issue is for awareness only. I expect this should be closed as "won't fix".