Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
1.11.0
-
None
Description
From my StackOverflow in relation to an issue I'm having with getting Snowflake (Cloud DB) to load Parquet files written with version 1.11.0
The problem only appears when using a map schema field in the Avro schema. For example:
{ "name": "FeatureAmounts", "type": { "type": "map", "values": "records.MoneyDecimal" } }
When using Parquet-Avro to write the file, a bad Parquet schema ends up with, for example
message record.ResponseRecord {
required binary GroupId (STRING);
required int64 EntryTime (TIMESTAMP(MILLIS,true));
required int64 HandlingDuration;
required binary Id (STRING);
optional binary ResponseId (STRING);
required binary RequestId (STRING);
optional fixed_len_byte_array(12) CostInUSD (DECIMAL(28,15));
required group FeatureAmounts (MAP) {
repeated group map (MAP_KEY_VALUE) {
required binary key (STRING);
required fixed_len_byte_array(12) value (DECIMAL(28,15));
}
}
}
From the great answer to my StackOverflow, it seems the issue is that the 1.11.0 Parquet-Avro is still using the legacy MAP_KEY_VALUE converted type, that has no logical type equivalent. From the comment on LogicalTypeAnnotation
// This logical type annotation is implemented to support backward compatibility with ConvertedType. // The new logical type representation in parquet-format doesn't have any key-value type, // thus this annotation is mapped to UNKNOWN. This type shouldn't be used.
However, it seems this is being written with the latest 1.11.0, which then causes Apache Arrow to fail with
Logical type Null can not be applied to group node
As it appears that Arrow only looks for the new logical type of Map or List, therefore this causes an error.
I have seen in Parquet Formats that LogicalTypes should be something like
// Map<String, Integer>
required group my_map (MAP) {
repeated group key_value {
required binary key (UTF8);
optional int32 value;
}
}
Is this on the correct path?
Attachments
Issue Links
- links to