Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-2132

Handle Llama expansions that are allocated after time out

    XMLWordPrintableJSON

Details

    Description

      Impala may make many expansion requests to Llama, some of which may end up timing out in the ResourceBroker, but Llama doesn't know that about the timeout and may still fulfill the request later. If it is fulfilled, Impala is later notified and Impala just logs a message that a request that timed out was fulfilled.

      The problem with this is that the resources are allocated to the reservation but they're never accounted for or used by Impala, so Llama thinks we have more resources than Impala does. In some cases, this seems to cause problems leading to the query failing, e.g. when the thread manager is oversubscribed and thus repeatedly sends expansion requests to get more vcores which continue to time out. Eventually, if some of the first expansion requests are fulfilled, the thread manager doesn't actually know this and will keep sending more expansion requests.

      There are several things we might consider doing:
      1) In Impala, attempt to account for the resources anyway, even though the request timed out. In some cases this would work fine, e.g. when the vcore expansion thread makes a request that times out. It will likely still want those vcores later and send another expansion request anyway (which is sometimes a problem as well, see IMPALA-1852). This might not work as well for memory because the mem tracker that needs the memory may no longer need it (e.g. if it spilled) or, even worse, it may have failed the query if the minimum buffers couldn't be acquired.
      2) Add a mechanism to Llama to release the expansion resources (not yet possible, only possible to release the entire reservation).
      3) Add a mechanism to Llama so that the timeouts occur in Llama first, then Llama is not charging the reservation for those resources.

      Attachments

        Activity

          People

            mjacobs Matthew Jacobs
            mjacobs Matthew Jacobs
            Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: