Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.0.0
Description
We currently do not propagate child nullability correctly when reading parquet files from Spark 3.0.1 (parquet-mr 1.10.1).
For example, the below taken from https://github.com/apache/parquet-format/blob/master/LogicalTypes.md is currently interpreted incorrectly:
// List<String> (list nullable, elements non-null)
optional group my_list (LIST) {
repeated group list {
required binary element (UTF8);
}
}
The Arrow type should be:
Field::new( "my_list", DataType::List( box Field::new("element", DataType::Utf8, nullable: false), ), nullable: true )
but we currently end up with
Field::new( "my_list", DataType::List( box Field::new("list", DataType::Utf8, nullable: true), ), nullable: true )
This doesn't seem to be an issue with the master branch as of opening this issue, so it might not be severe enough to try force into the 3.0.0 release.
I tested null and non-null Spark files, and was able to read them correctly. This becomes an issue with nested lists, which I'm working on.
Attachments
Issue Links
- links to