Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-5183

Drill doesn't seem to handle array values correctly in Parquet files

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.17.0
    • None
    • None

    Description

      It looks to me that Drill is not properly converting array values in Parquet records. I have created a simple example and will attach a simple Parquet file to this issue. If I write Parquet records using the Avro schema

      Book.avsc
      { "type": "record",
        "name": "Book",
        "fields": [
          { "name": "title", "type": "string" },
          { "name": "pages", "type": "int" },
          { "name": "authors", "type": {"type": "array", "items": "string"} }
        ]
      }
      

      I write two records using this schema into the attached Parquet file and then simply run SELECT * FROM dfs.`books.parquet` I get the following result:

      title pages authors
      Physics of Waves 477 {"array":["William C. Elmore","Mark A. Heald"]}
      Foundations of Mathematical Analysis 428 {"array":["Richard Johnsonbaugh","W.E. Pfaffenberger"]}

      You can see that the authors column seems to be a nested record with the name "array" instead of being a repeated value. If I change the SQL query to SELECT title,pages,t.authors.`array` FROM dfs.`/home/davek/src/drill-parquet-example/resources/books.parquet` t; then I get:

      title pages EXPR$2
      Physics of Waves 477 ["William C. Elmore","Mark A. Heald"]
      Foundations of Mathematical Analysis 428 ["Richard Johnsonbaugh","W.E. Pfaffenberger"]

      and now that column behaves in Drill as a repeated values column.

      Attachments

        1. books.parquet
          0.9 kB
          Dave Kincaid

        Activity

          People

            ihuzenko Igor Guzenko
            dkincaid Dave Kincaid
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: