Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
1.1.0, 2.3.4, 3.1.2
Description
When hive execute jobs in parallel(control by “hive.exec.parallel” parameter), ConditionalTasks remove the tasks that not selected in parallel, because there are thread safety issues, some task may not remove from the dependent task tree. This is a very serious bug, which causes some stage task not trigger execution.
In our production cluster, the query run three conditional task in parallel, after apply the patch of HIVE-21638, we found Stage-3 is miss and not submit to runnable list for his parent Stage-31 is not done. But Stage-31 should removed for it not selected.
Stage dependencies is below:
STAGE DEPENDENCIES: Stage-41 is a root stage Stage-26 depends on stages: Stage-41 Stage-25 depends on stages: Stage-26 , consists of Stage-39, Stage-40, Stage-2 Stage-39 has a backup stage: Stage-2 Stage-23 depends on stages: Stage-39 Stage-3 depends on stages: Stage-2, Stage-12, Stage-16, Stage-20, Stage-23, Stage-24, Stage-27, Stage-28, Stage-31, Stage-32, Stage-35, Stage-36 Stage-8 depends on stages: Stage-3 , consists of Stage-5, Stage-4, Stage-6 Stage-5 Stage-0 depends on stages: Stage-5, Stage-4, Stage-7 Stage-51 depends on stages: Stage-0 Stage-4 Stage-6 Stage-7 depends on stages: Stage-6 Stage-40 has a backup stage: Stage-2 Stage-24 depends on stages: Stage-40 Stage-2 Stage-44 is a root stage Stage-30 depends on stages: Stage-44 Stage-29 depends on stages: Stage-30 , consists of Stage-42, Stage-43, Stage-12 Stage-42 has a backup stage: Stage-12 Stage-27 depends on stages: Stage-42 Stage-43 has a backup stage: Stage-12 Stage-28 depends on stages: Stage-43 Stage-12 Stage-47 is a root stage Stage-34 depends on stages: Stage-47 Stage-33 depends on stages: Stage-34 , consists of Stage-45, Stage-46, Stage-16 Stage-45 has a backup stage: Stage-16 Stage-31 depends on stages: Stage-45 Stage-46 has a backup stage: Stage-16 Stage-32 depends on stages: Stage-46 Stage-16 Stage-50 is a root stage Stage-38 depends on stages: Stage-50 Stage-37 depends on stages: Stage-38 , consists of Stage-48, Stage-49, Stage-20 Stage-48 has a backup stage: Stage-20 Stage-35 depends on stages: Stage-48 Stage-49 has a backup stage: Stage-20 Stage-36 depends on stages: Stage-49 Stage-20
Stage tasks execute log is below, we can see Stage-33 is conditional task and it consists of Stage-45, Stage-46, Stage-16, Stage-16 is launched, Stage-45 and Stage-46 should remove from the dependent tree, Stage-31 is child of Stage-45 parent of Stage-3, So, Stage-31 should removed too. As see in the below log, we find Stage-31 is still in the parent list of Stage-3, this should not happend.
2020-12-03T01:09:50,939 INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Launching Job 1 out of 17 2020-12-03T01:09:50,940 INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Starting task [Stage-26:MAPRED] in parallel 2020-12-03T01:09:50,941 INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Launching Job 2 out of 17 2020-12-03T01:09:50,943 INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Starting task [Stage-30:MAPRED] in parallel 2020-12-03T01:09:50,943 INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Launching Job 3 out of 17 2020-12-03T01:09:50,943 INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Starting task [Stage-34:MAPRED] in parallel 2020-12-03T01:09:50,944 INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Launching Job 4 out of 17 2020-12-03T01:09:50,944 INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Starting task [Stage-38:MAPRED] in parallel 2020-12-03T01:10:32,946 INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Starting task [Stage-29:CONDITIONAL] in parallel 2020-12-03T01:10:32,946 INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Starting task [Stage-33:CONDITIONAL] in parallel 2020-12-03T01:10:32,946 INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Starting task [Stage-37:CONDITIONAL] in parallel 2020-12-03T01:10:34,946 INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Launching Job 5 out of 17 2020-12-03T01:10:34,947 INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Starting task [Stage-16:MAPRED] in parallel 2020-12-03T01:10:34,948 INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Launching Job 6 out of 17 2020-12-03T01:10:34,948 INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Starting task [Stage-12:MAPRED] in parallel 2020-12-03T01:10:34,949 INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Launching Job 7 out of 17 2020-12-03T01:10:34,950 INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Starting task [Stage-20:MAPRED] in parallel 2020-12-03T01:10:34,950 INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Starting task [Stage-25:CONDITIONAL] in parallel 2020-12-03T01:10:36,950 INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Launching Job 8 out of 17 2020-12-03T01:10:36,951 INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Starting task [Stage-2:MAPRED] in parallel 2020-12-01T22:20:17,774 INFO [HiveServer2-Background-Pool: Thread-233156] ql.Driver: Task:Stage-3:MAPRED Parent:Stage-31:MAPRED isDone:false 2020-12-01T22:20:17,774 ERROR [HiveServer2-Background-Pool: Thread-233156] ql.Driver: Miss stage: Stage-3for queryid hive_20201201221740_805852c0-60a7-4141-96e9-196f83b2705e 2020-12-01T22:20:17,774 ERROR [HiveServer2-Background-Pool: Thread-233156] ql.Driver: Miss stage for queryid hive_20201201221740_805852c0-60a7-4141-96e9-196f83b2705e : FAILED: Some Execute Stage miss error