Description
Parquet support should handle ArrayType when containsNull is true.
When containsNull is true, the schema should be as follows:
message root { optional group a (LIST) { repeated group bag { optional int32 array_element; } } }
FYI:
Hive's Parquet writer always uses this schema, and reader can read only from this schema, i.e. current Parquet support of SparkSQL is not compatible with Hive.
NOTICE:
If Hive compatiblity is top priority, we also have to use this schma regardless of containsNull, which will break backward compatibility.
But using this schema could affect performance.
Attachments
Issue Links
- links to