Parquet support should handle ArrayType when containsNull is true.
When containsNull is true, the schema should be as follows:
message root { optional group a (LIST) { repeated group bag { optional int32 array_element; } } }
Hive's Parquet writer always uses this schema, and reader can read only from this schema, i.e. current Parquet support of SparkSQL is not compatible with Hive.
If Hive compatiblity is top priority, we also have to use this schma regardless of containsNull, which will break backward compatibility.
But using this schema could affect performance.
Issue Links
- links to