[SPARK-26345] Parquet support Column indexes - ASF JIRA

XML

Word

Printable

JSON

Parquet 1.11 supports column indexing. Spark can supports this feature for better read performance.

More details:

Benchmark result:

This feature is enabled by default, and users can disable it by setting parquet.filter.columnindex.enabled to false.

relates to

SPARK-34859 Vectorized parquet reader needs synchronization among pages for column index

1.	Upgrade built-in Hive to 2.3.8	Resolved	Yuming Wang
2.	Upgrade Jackson to 2.11.4	Resolved	Yuming Wang
3.	Upgrade parquet to 1.11.1	Resolved	Yuming Wang
4.	Upgrade to Avro 1.10.1	Resolved	Ismaël Mejía
5.	Do not push down partition filters to ParquetScan for DataSourceV2	Resolved	Yang Jie
6.	Vectorized reader support column index	Resolved	Yuming Wang