Details
Description
This is a follow for SPARK-39833. In https://github.com/apache/spark/pull/37419, we disabled column index for Parquet due to correctness issues that we found when filtering data on the partition column overlapping with data schema.
This ticket is for permanent and thorough fix for the issue and re-enablement of the column index. See more details in the PR linked above.
Attachments
Issue Links
- relates to
-
SPARK-39833 Filtered parquet data frame count() and show() produce inconsistent results when spark.sql.parquet.filterPushdown is true
- Resolved
- links to