Description
Current Parquet schema for MapType is as follows regardless of valueContainsNull:
message root { optional group a (MAP) { repeated group map (MAP_KEY_VALUE) { required int32 key; required int32 value; } } }
and if the map contains null value, it throws runtime exception.
To handle MapType containing null value, the schema should be as follows if valueContainsNull is true:
message root { optional group a (MAP) { repeated group map (MAP_KEY_VALUE) { required int32 key; optional int32 value; } } }
FYI:
Hive's Parquet writer always uses the latter schema, but reader can read from both schema.
NOTICE:
This change will break backward compatibility when the schema is read from Parquet metadata ("org.apache.spark.sql.parquet.row.metadata").
Attachments
Issue Links
- relates to
-
SPARK-2721 Fix MapType compatibility issues with reading Parquet datasets
- Resolved
- links to