Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-13858

LLAP: A preempted task can end up waiting on completeInitialization if some part of the executing code suppressed the interrupt

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • 2.0.0
    • 2.1.0
    • None

    Description

      An interrupt along with a HiveProcessor.abort call is made when attempting to preempt a task.

      In this specific case, the task was in the middle of HDFS IO - which 'handled' the interrupt by retrying. As a result the interrupt status on the thread was reset - so instead of skipping the future.get in completeInitialization - the task ended up blocking there.

      End result - a single executor slot permanently blocked in LLAP. Depending on what else is running - this can cause a cluster level deadlock.

      Attachments

        1. HIVE-13858.01.patch
          4 kB
          Siddharth Seth
        2. HIVE-13858.02.patch
          8 kB
          Siddharth Seth
        3. HIVE-13858.03.patch
          9 kB
          Siddharth Seth

        Activity

          People

            sseth Siddharth Seth
            sseth Siddharth Seth
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: