Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-364

Parquet-avro cannot decode Avro/Thrift array of primitive array (e.g. array<array<int>>)

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.5.0, 1.6.0, 1.7.0, 1.8.0
    • Fix Version/s: 1.8.2
    • Component/s: parquet-mr
    • Labels:
      None

      Description

      The problematic Avro and Thrift schemas are:

      record AvroArrayOfArray {
        array<array<int>> int_arrays_column;
      }
      

      and

      struct ThriftListOfList {
        1: list<list<i32>> intArraysColumn;
      }
      

      They are converted to the following structurally equivalent Parquet schemas by parquet-avro 1.7.0 and parquet-thrift 1.7.0 respectively:

      message AvroArrayOfArray {
        required group int_arrays_column (LIST) {
          repeated group array (LIST) {
            repeated int32 array;
          }
        }
      }
      

      and

      message ParquetSchema {
        required group intListsColumn (LIST) {
          repeated group intListsColumn_tuple (LIST) {
            repeated int32 intListsColumn_tuple_tuple;
          }
        }
      }
      

      AvroIndexedRecordConverter cannot decode such records correctly. The reason is that the 2nd level repeated group array doesn't pass AvroIndexedRecordConverter.isElementType() check. We should check for field name "array" and field name suffix "_thrift" in isElementType() to fix this issue.

        Attachments

        1. bad-avro.parquet
          0.7 kB
          Cheng Lian
        2. bad-thrift.parquet
          1 kB
          Cheng Lian

          Issue Links

            Activity

              People

              • Assignee:
                rdblue Ryan Blue
                Reporter:
                lian cheng Cheng Lian
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: