[SPARK-25407] Spark throws a `ParquetDecodingException` when attempting to read a field from a complex type in certain cases of schema merging - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.4.0
Fix Version/s: 3.0.0
Component/s: SQL
Labels:
None

Description

Spark supports merging schemata across table partitions in which one partition is missing a subfield that's present in another. However, attempting to select that missing field with a query that includes a partition pruning predicate that filters out the partitions that include that field results in a `ParquetDecodingException` when attempting to get the query results.

This bug is specifically exercised by the failing (but ignored) test case https://github.com/apache/spark/blob/f2d35427eedeacceb6edb8a51974a7e8bbb94bc2/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruningSuite.scala#L125-L131.

Attachments

Issue Links

causes

SPARK-31116 PrquetRowConverter does not follow case sensitivity

Resolved

is depended upon by

SPARK-31536 Backport SPARK-25407 Allow nested access for non-existent field for Parquet file when nested pruning is enabled

Resolved

is duplicated by

SPARK-25879 Schema pruning fails when a nested field and top level field are selected

Resolved

links to

[Github] Pull Request #22880 (mallman)

GitHub Pull Request #22880

GitHub Pull Request #24307

(2 links to)

Activity

People

Assignee:: Michael MacFadden

Reporter:: Michael MacFadden

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 11/Sep/18 15:59

Updated:: 12/Dec/22 18:10

Resolved:: 08/Apr/19 13:29