Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4826

Impala should ignore the root schema's repetition in Parquet

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • Impala 2.5.0, Impala 2.6.0, Impala 2.7.0, Impala 2.8.0, Impala 2.9.0
    • Impala 2.11.0
    • Backend

    Description

      See https://issues.apache.org/jira/browse/PARQUET-843 . parquet-cpp was generating files that set the root schema's repetition to REPEATED, which threw off Impala's schema resolution so it couldn't read the file. PARQUET-843 includes an example file

      The field description in parquet.thrift explicitly says that the root schema's repetition should be unset (https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L231) but it seems like other tools may write out various things there.

      We should just ignore the repetition on the root schema, since it's meaningless.

      Attachments

        Activity

          People

            gaborkaszab Gabor Kaszab
            tarmstrong Tim Armstrong
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: