Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-1879

Apache Arrow can not read a Parquet File written with Parqet-Avro 1.11.0 with a Map field




      From my StackOverflow in relation to an issue I'm having with getting Snowflake (Cloud DB) to load Parquet files written with version 1.11.0

      The problem only appears when using a map schema field in the Avro schema. For example:

            "name": "FeatureAmounts",
            "type": {
              "type": "map",
              "values": "records.MoneyDecimal"

      When using Parquet-Avro to write the file, a bad Parquet schema ends up with, for example

      message record.ResponseRecord {
        required binary GroupId (STRING);
        required int64 EntryTime (TIMESTAMP(MILLIS,true));
        required int64 HandlingDuration;
        required binary Id (STRING);
        optional binary ResponseId (STRING);
        required binary RequestId (STRING);
        optional fixed_len_byte_array(12) CostInUSD (DECIMAL(28,15));
        required group FeatureAmounts (MAP) {
          repeated group map (MAP_KEY_VALUE) {
            required binary key (STRING);
            required fixed_len_byte_array(12) value (DECIMAL(28,15));

      From the great answer to my StackOverflow, it seems the issue is that the 1.11.0 Parquet-Avro is still using the legacy MAP_KEY_VALUE converted type, that has no logical type equivalent. From the comment on LogicalTypeAnnotation

      // This logical type annotation is implemented to support backward compatibility with ConvertedType.
        // The new logical type representation in parquet-format doesn't have any key-value type,
        // thus this annotation is mapped to UNKNOWN. This type shouldn't be used.

      However, it seems this is being written with the latest 1.11.0, which then causes Apache Arrow to fail with

      Logical type Null can not be applied to group node

      As it appears that Arrow only looks for the new logical type of Map or List, therefore this causes an error.

      I have seen in Parquet Formats that LogicalTypes should be something like

      // Map<String, Integer>
      required group my_map (MAP) {
        repeated group key_value {
          required binary key (UTF8);
          optional int32 value;

      Is this on the correct path?


        Issue Links



              maccamlc Matthew McMahon
              maccamlc Matthew McMahon
              1 Vote for this issue
              5 Start watching this issue