Uploaded image for project: 'Apache Tez'
  1. Apache Tez
  2. TEZ-3405

Support ability for AM to kill itself if there is no client heartbeating to it

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • None
    • 0.9.0
    • None
    • None

    Description

      HiveServer2 optionally maintains a pool of AMs in either Tez or LLAP mode. This is done to amortize the cost of launching a Tez session.

      We also try in a shutdown hook to kill all these AMs when HS2 goes down. However, there are cases where HS2 doesn't get the chance to kill these AMs before it goes away. As a result these zombie AMs hang around until the timeout kicks in.

      The trouble with the timeout is that we have to set it fairly high. Otherwise the benefit of having pre-launched AMs obviously goes away (in a lightly loaded cluster).

      So, if people kill/restart HS2 they often times run into situations where the cluster/queue doesn't have any more capacity for AMs. They either have to manually kill the zombies or wait.

      The request is therefore for Tez to maintain a heartbeat to the client. If the client goes away the AM should exit. That way we can keep the AMs alive for a long time regardless of activity and at the same time don't have to worry about them if HS2 goes down.

      Attachments

        1. TEZ-3405.1.patch
          19 kB
          Hitesh Shah
        2. TEZ-3405.2.patch
          25 kB
          Hitesh Shah
        3. TEZ-3405.3.patch
          34 kB
          Hitesh Shah
        4. TEZ-3405.4.patch
          35 kB
          Hitesh Shah
        5. TEZ-3405.5.patch
          44 kB
          Hitesh Shah
        6. TEZ-3405.6.patch
          44 kB
          Hitesh Shah
        7. TEZ-3405.7.patch
          43 kB
          Hitesh Shah

        Activity

          People

            hitesh Hitesh Shah
            hagleitn Gunther Hagleitner
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: