Description
Parquet 1.11 supports column indexing. Spark can supports this feature for better read performance.
More details:
https://issues.apache.org/jira/browse/PARQUET-1201
Benchmark result:
https://github.com/apache/spark/pull/31393#issuecomment-769767724
This feature is enabled by default, and users can disable it by setting parquet.filter.columnindex.enabled to false.
Attachments
Issue Links
- relates to
-
SPARK-34859 Vectorized parquet reader needs synchronization among pages for column index
- Resolved
1.
|
Upgrade built-in Hive to 2.3.8 | Resolved | Yuming Wang | |
2.
|
Upgrade Jackson to 2.11.4 | Resolved | Yuming Wang | |
3.
|
Upgrade parquet to 1.11.1 | Resolved | Yuming Wang | |
4.
|
Upgrade to Avro 1.10.1 | Resolved | Ismaël Mejía | |
5.
|
Do not push down partition filters to ParquetScan for DataSourceV2 | Resolved | Yang Jie | |
6.
|
Vectorized reader support column index | Resolved | Yuming Wang |