[SPARK-40775] V2 file scans have duplicative descriptions - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.3.0
Fix Version/s: 3.4.0
Component/s: SQL
Labels:
None

Description

V2 file scans have duplication in the description. This is because FileScan uses the metadata to create the description, but each file type overrides metadata and the description adding the same metadata.

Example from a parquet agg pushdown explain:

+- BatchScan parquet file:/...[min(_3)#814, max(_3)#815, min(_1)#816, max(_1)#817, count#818L, count(_1)#819L, count(_2)#820L, count(_3)#821L] ParquetScan DataFilters: [], Format: parquet, Location: InMemoryFileIndex(1 paths)[file:/..., PartitionFilters: [], PushedAggregation: [MIN(_3), MAX(_3), MIN(_1), MAX(_1), COUNT(*), COUNT(_1), COUNT(_2), COUNT(_3)], PushedFilters: [], PushedGroupBy: [], ReadSchema: struct<min(_3):int,max(_3):int,min(_1):int,max(_1):int,count:bigint,count(_1):bigint,count(_2)..., PushedFilters: [], PushedAggregation: [MIN(_3), MAX(_3), MIN(_1), MAX(_1), COUNT(*), COUNT(_1), COUNT(_2), COUNT(_3)], PushedGroupBy: [] RuntimeFilters: []

Attachments

Issue Links

links to

[Github] Pull Request #38229 (Kimahriman)

Activity

People

Assignee:: Adam Binford

Reporter:: Adam Binford

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 12/Oct/22 21:10

Updated:: 12/Dec/22 10:16

Resolved:: 12/Dec/22 10:16