[DRILL-5183] Drill doesn't seem to handle array values correctly in Parquet files - ASF JIRA

Attach files

Attach Screenshot

Voters

Watch issue

Watchers

Create sub-task

Link

Clone

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.17.0
Component/s: None
Labels:
None

Description

It looks to me that Drill is not properly converting array values in Parquet records. I have created a simple example and will attach a simple Parquet file to this issue. If I write Parquet records using the Avro schema

Book.avsc

{ "type": "record",
  "name": "Book",
  "fields": [
    { "name": "title", "type": "string" },
    { "name": "pages", "type": "int" },
    { "name": "authors", "type": {"type": "array", "items": "string"} }
  ]
}

I write two records using this schema into the attached Parquet file and then simply run SELECT * FROM dfs.`books.parquet` I get the following result:

title	pages	authors
Physics of Waves	477	{"array":["William C. Elmore","Mark A. Heald"]}
Foundations of Mathematical Analysis	428	{"array":["Richard Johnsonbaugh","W.E. Pfaffenberger"]}

You can see that the authors column seems to be a nested record with the name "array" instead of being a repeated value. If I change the SQL query to SELECT title,pages,t.authors.`array` FROM dfs.`/home/davek/src/drill-parquet-example/resources/books.parquet` t; then I get:

title	pages	EXPR$2
Physics of Waves	477	["William C. Elmore","Mark A. Heald"]
Foundations of Mathematical Analysis	428	["Richard Johnsonbaugh","W.E. Pfaffenberger"]

and now that column behaves in Drill as a repeated values column.

Attachments

books.parquet
09/Jan/17 19:56
0.9 kB
Dave Kincaid

Activity

Comment

This comment will be Viewable by All Users Viewable by All Users

Cancel

People

Assignee:: Igor Guzenko

Reporter:: Dave Kincaid

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 09/Jan/17 19:56

Updated:: 21/Oct/19 09:24

Resolved:: 21/Oct/19 09:24

Agile

View on Board

Drill doesn't seem to handle array values correctly in Parquet files

Details

Description

Attachments

Attachments

Activity

People

Dates

Agile

Slack

Issue deployment