Description
Before HIVE-17877, each DPP sink has only one target work. The output path of a DPP work is TMP_PATH/targetWorkId/dppWorkId. When we do the pruning, each map work reads DPP outputs under TMP_PATH/targetWorkId.
After HIVE-17877, each DPP sink can have multiple target works. It's possible that a map work needs to read DPP outputs from multiple TMP_PATH/targetWorkId. To solve this, I think we can have a DPP output path specific to each query, e.g. QUERY_TMP_PATH/dpp_output. Each DPP work outputs to QUERY_TMP_PATH/dpp_output/dppWorkId. And each map work reads from QUERY_TMP_PATH/dpp_output.
Attachments
Attachments
Issue Links
- is broken by
-
HIVE-17877 HoS: combine equivalent DPP sink works
- Closed
- relates to
-
HIVE-19895 The unique ID in SparkPartitionPruningSinkOperator is no longer needed
- Open