Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-8656

Support for eagerly fetching and spooling all query result rows

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • Impala 2.12.0, Impala 3.2.0
    • Impala 3.4.0
    • Backend
    • None
    • ghx-label-1

    Description

      Impala's current interaction with clients is pulled-based: it relies on clients to fetch results to trigger the generation of more result row batches until all the result rows have been produced. If a client issues a query without fetching all the results, the query fragments will continue to consume the resources until the query hits is cancelled and unregistered for whatever reasons. This is undesirable as resources are held up by misbehaving clients and other queries may wait for extended period of time in admission control due to this.

      The high level idea for this JIRA is for Impala to have a mode in which result sets of queries are eagerly fetched and spooled somewhere (preferably some persistent storage). In this way, the cluster's resources are freed up once all result rows have been fetched and stored in the spooling location. Incoming client fetches can be returned from this spooled locations.

      cc'ing stakiar, twm378, joemcdonnell, lv

      Attachments

        Issue Links

          Activity

            People

              stakiar Sahil Takiar
              kwho Michael Ho
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: