Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
2.3.4
-
None
-
None
-
hive-2.3.4
Description
When hive execute jobs in parallel(control by “hive.exec.parallel” parameter), tasks submit to yarn with parallel. If the jobs completed simultaneously, then Their children task may submit more than ones.
In our production cluster, we have a query with the stage dependencies is below:
STAGE DEPENDENCIES: Stage-1 is a root stage Stage-2 depends on stages: Stage-1, Stage-10, Stage-14 Stage-7 depends on stages: Stage-2 , consists of Stage-4, Stage-3, Stage-5 Stage-4 Stage-0 depends on stages: Stage-4, Stage-3, Stage-6 Stage-3 Stage-5 Stage-6 depends on stages: Stage-5 Stage-18 is a root stage Stage-9 depends on stages: Stage-18 Stage-10 depends on stages: Stage-9 Stage-19 is a root stage Stage-13 depends on stages: Stage-19 Stage-14 depends on stages: Stage-13
There is a certain probability that Stage-10 and Stage-14 will complete at the same time, then their children Stage-2 was submitted twice. As bellow log:
2021-01-03T13:35:32,079 INFO [d207a1c7-287d-4f03-83c8-f2c42ed878a9 main] ql.Driver: Launching Job 6 out of 6 2021-01-03T13:35:32,080 INFO [d207a1c7-287d-4f03-83c8-f2c42ed878a9 main] ql.Driver: Starting task [Stage-2:MAPRED] in parallel 2021-01-03T13:35:32,082 INFO [d207a1c7-287d-4f03-83c8-f2c42ed878a9 main] ql.Driver: Launching Job 7 out of 6 2021-01-03T13:35:32,083 INFO [d207a1c7-287d-4f03-83c8-f2c42ed878a9 main] ql.Driver: Starting task [Stage-2:MAPRED] in parallel
Attachments
Attachments
Issue Links
- Blocked
-
HIVE-25026 hive sql result is duplicate data cause of same task resubmission
- Open
- is duplicated by
-
HIVE-24578 Task resubmission bug
- Resolved