[SPARK-40169] Fix the issue with Parquet column index and predicate pushdown in Data source V1 - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.3.1, 3.2.3, 3.4.0
Fix Version/s: 3.3.1, 3.2.3, 3.4.0
Component/s: SQL
Labels:
None

Description

This is a follow for ~~SPARK-39833~~. In https://github.com/apache/spark/pull/37419, we disabled column index for Parquet due to correctness issues that we found when filtering data on the partition column overlapping with data schema.

This ticket is for permanent and thorough fix for the issue and re-enablement of the column index. See more details in the PR linked above.

Attachments

Issue Links

relates to

SPARK-39833 Filtered parquet data frame count() and show() produce inconsistent results when spark.sql.parquet.filterPushdown is true

Resolved

links to

[Github] Pull Request #37881 (sunchao)

Activity

People

Assignee:: Chao Sun

Reporter:: Ivan Sadikov

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 21/Aug/22 22:47

Updated:: 16/Sep/22 17:51

Resolved:: 16/Sep/22 17:46