[SPARK-40918] Mismatch between ParquetFileFormat and FileSourceScanExec in # columns for WSCG.isTooManyFields when using _metadata - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.3.0
Fix Version/s: 3.3.2
Component/s: SQL
Labels:
None

Description

_metadata.columns are taken into account in FileSourceScanExec.supportColumnar, but not when the parquet reader is created. This can result in Parquet reader outputting columnar (because it has less columns than WSCG.isTooManyFields), whereas FileSourceScanExec wants row output (because with the extra metadata columns it hits the isTooManyFields limit).

I have a fix forthcoming.

Attachments

Issue Links

is related to

ORC-1578 Fix SparkBenchmark according to SPARK-40918

Closed

links to

[Github] Pull Request #38397 (juliuszsompolski)

[Github] Pull Request #38431 (juliuszsompolski)

Activity

People

Assignee:: Juliusz Sompolski

Reporter:: Juliusz Sompolski

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 26/Oct/22 11:46

Updated:: 09/Jan/24 03:21

Resolved:: 31/Oct/22 06:00