Details
-
Bug
-
Status: Patch Available
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
in HIVE-16602, Implement shared scans with Tez.
Given a query plan, the goal is to identify scans on input tables that can be merged so the data is read only once. Optimization will be carried out at the physical level. In Hive on Spark, it caches the result of spark work if the spark work is used by more than 1 child spark work. After sharedWorkOptimizer is enabled in physical plan in HoS, the identical table scans are merged to 1 table scan. This result of table scan will be used by more 1 child spark work. Thus we need not do the same computation because of cache mechanism.
Attachments
Attachments
Issue Links
- is blocked by
-
HIVE-18289 Support Parquet/Orc format when enable rdd cache in Hive on Spark
- Open
-
HIVE-18301 Investigate to enable MapInput cache in Hive on Spark
- Patch Available
- links to