Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
None
-
None
-
None
Description
Spark already optimizes hive parquet tables reading using whole stage code generation.
Similar approach could be used for scanning Hive ORC tables - currently standard hive table scan is used.
ORC is sometimes preferred over parquet in hive ecosystem because of better support and characteristics in certain scenarios
Attachments
Issue Links
- blocks
-
SPARK-20901 Feature parity for ORC with Parquet
- Open
- duplicates
-
SPARK-16060 Vectorized ORC reader
- Resolved