Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-651

Parquet-avro fails to decode array of record with a single field name "element" correctly

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.7.0, 1.8.0, 1.8.1, 1.9.0
    • Fix Version/s: 1.9.0, 1.8.2
    • Component/s: parquet-avro
    • Labels:
      None

      Description

      Found this issue while investigating SPARK-16344.

      For the following Parquet schema

      message root {
        optional group f (LIST) {
          repeated group list {
            optional group element {
              optional int64 element;
            }
          }
        }
      }
      

      parquet-avro decodes it as something like this:

      record SingleElement {
        int element;
      }
      
      record NestedSingleElement {
        SingleElement element;
      }
      
      record Spark16344Wrong {
        array<NestedSingleElement> f;
      }
      

      while correct interpretation should be:

      record SingleElement {
        int element;
      }
      
      record Spark16344 {
        array<SingleElement> f;
      }
      

      The reason is that the element syntactic group for LIST in

      <list-repetition> group <name> (LIST) {
        repeated group list {
          <element-repetition> <element-type> element;
        }
      }
      

      is recognized as a record field named element. The problematic code lies in AvroRecordConverter.isElementType(). We should probably check the standard 3-level layout first before falling back to the legacy 2-level layout.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                rdblue Ryan Blue
                Reporter:
                lian cheng Cheng Lian
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: