Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
3.5.0
Description
In the scenario of hive table scan, by configuring the hadoop.mapred.max.split.size parameter, you can increase the parallelism of the scan hive table stage, thereby reducing the running time.
However, if a large table and a small table are in the same query, if only a separate hadoop.mapred.max.split.size parameter is configured, some stages will run a very large number of tasks, and some stages will The number of tasks running is very small. For runtime tasks, the hadoop.mapred.max.split.size parameter can be set separately for each hive table to ensure this balance.
Attachments
Issue Links
- links to