Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Won't Fix
-
None
-
None
-
None
Description
Currently, for multi-table insertion it generates 1+N tasks - "1" is the task that generates input, and "N" are the insert queries that read from the input and write to separate output tables.
In order to make these N tasks run in parallel, we rely on hive.exec.parallel to be set to true. In this patch, we propose an alternative approach, which is to combine these N tasks into one single task, which contains N separate operator trees, which in execution leads to N result RDDs. We then may be able to execute these N RDDs in parallel inside Spark, without needing hive.exec.parallel.