Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-24467

ConditionalTask remove tasks that not selected exists thread safety problem

    XMLWordPrintableJSON

Details

    Description

      When hive execute jobs in parallel(control by “hive.exec.parallel” parameter), ConditionalTasks  remove the tasks that not selected in parallel, because there are thread safety issues, some task may not remove from the dependent task tree. This is a very serious bug, which causes some stage task not trigger execution.

      In our production cluster, the query run three conditional task in parallel, after apply the patch of HIVE-21638, we found Stage-3 is miss and not submit to runnable list for his parent Stage-31 is not done. But Stage-31 should removed for it not selected.

      Stage dependencies is below:

      STAGE DEPENDENCIES:
        Stage-41 is a root stage
        Stage-26 depends on stages: Stage-41
        Stage-25 depends on stages: Stage-26 , consists of Stage-39, Stage-40, Stage-2
        Stage-39 has a backup stage: Stage-2
        Stage-23 depends on stages: Stage-39
        Stage-3 depends on stages: Stage-2, Stage-12, Stage-16, Stage-20, Stage-23, Stage-24, Stage-27, Stage-28, Stage-31, Stage-32, Stage-35, Stage-36
        Stage-8 depends on stages: Stage-3 , consists of Stage-5, Stage-4, Stage-6
        Stage-5
        Stage-0 depends on stages: Stage-5, Stage-4, Stage-7
        Stage-51 depends on stages: Stage-0
        Stage-4
        Stage-6
        Stage-7 depends on stages: Stage-6
        Stage-40 has a backup stage: Stage-2
        Stage-24 depends on stages: Stage-40
        Stage-2
        Stage-44 is a root stage
        Stage-30 depends on stages: Stage-44
        Stage-29 depends on stages: Stage-30 , consists of Stage-42, Stage-43, Stage-12
        Stage-42 has a backup stage: Stage-12
        Stage-27 depends on stages: Stage-42
        Stage-43 has a backup stage: Stage-12
        Stage-28 depends on stages: Stage-43
        Stage-12
        Stage-47 is a root stage
        Stage-34 depends on stages: Stage-47
        Stage-33 depends on stages: Stage-34 , consists of Stage-45, Stage-46, Stage-16
        Stage-45 has a backup stage: Stage-16
        Stage-31 depends on stages: Stage-45
        Stage-46 has a backup stage: Stage-16
        Stage-32 depends on stages: Stage-46
        Stage-16
        Stage-50 is a root stage
        Stage-38 depends on stages: Stage-50
        Stage-37 depends on stages: Stage-38 , consists of Stage-48, Stage-49, Stage-20
        Stage-48 has a backup stage: Stage-20
        Stage-35 depends on stages: Stage-48
        Stage-49 has a backup stage: Stage-20
        Stage-36 depends on stages: Stage-49
        Stage-20
      

      Stage tasks execute log is below, we can see Stage-33 is conditional task and it consists of Stage-45, Stage-46, Stage-16, Stage-16 is launched, Stage-45 and Stage-46 should remove from the dependent tree, Stage-31 is child of Stage-45 parent of Stage-3, So, Stage-31 should removed too. As see in the below log, we find Stage-31 is still in the parent list of Stage-3, this should not happend.

      2020-12-03T01:09:50,939  INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Launching Job 1 out of 17
      2020-12-03T01:09:50,940  INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Starting task [Stage-26:MAPRED] in parallel
      2020-12-03T01:09:50,941  INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Launching Job 2 out of 17
      2020-12-03T01:09:50,943  INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Starting task [Stage-30:MAPRED] in parallel
      2020-12-03T01:09:50,943  INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Launching Job 3 out of 17
      2020-12-03T01:09:50,943  INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Starting task [Stage-34:MAPRED] in parallel
      2020-12-03T01:09:50,944  INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Launching Job 4 out of 17
      2020-12-03T01:09:50,944  INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Starting task [Stage-38:MAPRED] in parallel
      2020-12-03T01:10:32,946  INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Starting task [Stage-29:CONDITIONAL] in parallel
      2020-12-03T01:10:32,946  INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Starting task [Stage-33:CONDITIONAL] in parallel
      2020-12-03T01:10:32,946  INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Starting task [Stage-37:CONDITIONAL] in parallel
      2020-12-03T01:10:34,946  INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Launching Job 5 out of 17
      2020-12-03T01:10:34,947  INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Starting task [Stage-16:MAPRED] in parallel
      2020-12-03T01:10:34,948  INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Launching Job 6 out of 17
      2020-12-03T01:10:34,948  INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Starting task [Stage-12:MAPRED] in parallel
      2020-12-03T01:10:34,949  INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Launching Job 7 out of 17
      2020-12-03T01:10:34,950  INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Starting task [Stage-20:MAPRED] in parallel
      2020-12-03T01:10:34,950  INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Starting task [Stage-25:CONDITIONAL] in parallel
      2020-12-03T01:10:36,950  INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Launching Job 8 out of 17
      2020-12-03T01:10:36,951  INFO [HiveServer2-Background-Pool: Thread-87372] ql.Driver: Starting task [Stage-2:MAPRED] in parallel
      
      2020-12-01T22:20:17,774  INFO [HiveServer2-Background-Pool: Thread-233156] ql.Driver: Task:Stage-3:MAPRED Parent:Stage-31:MAPRED isDone:false
      2020-12-01T22:20:17,774 ERROR [HiveServer2-Background-Pool: Thread-233156] ql.Driver: Miss stage: Stage-3for queryid hive_20201201221740_805852c0-60a7-4141-96e9-196f83b2705e
      2020-12-01T22:20:17,774 ERROR [HiveServer2-Background-Pool: Thread-233156] ql.Driver: Miss stage for queryid hive_20201201221740_805852c0-60a7-4141-96e9-196f83b2705e : FAILED: Some Execute Stage miss error
      

       

      Attachments

        Activity

          People

            jshmchenxi Xi Chen
            gjhkael guojh
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 1h 20m
                1h 20m