Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-1744

Improve query start-up efficiency.

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • Impala 2.1, Impala 2.1.1
    • None
    • None

    Description

      The way we initiate remote work on behalf of a query can be made more efficient in several ways. The main problems with our current approach are:
      1. We issue one RPC per fragment per host. This adds up for large clusters and.or complex queries. We also ship redundant data, e.g., every fragment gets its own copy of the DescriptorTable.
      2. For each fragment, we start all its fragment instances in parallel, each RPC in its own thread. For large clusters, the number of threads can be substantial.

      Suggested improvements:
      1. We should group the fragment instances by host, and only issue one RPC per host to start up all fragments that host should run. On the remote side, care must be taken to start the fragments in the correct order. Initiating work in this way, will allow us to send only a single DescriptorTable that can be shared by all fragments instances on a particular host.
      2. Use a thread pool for starting remote work to avoid excessive thread use.

      Attachments

        Issue Links

          Activity

            People

              henryr Henry Robinson
              alex.behm Alexander Behm
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: