Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-1744

Improve query start-up efficiency.

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: Impala 2.1, Impala 2.1.1
    • Fix Version/s: None
    • Component/s: None

      Description

      The way we initiate remote work on behalf of a query can be made more efficient in several ways. The main problems with our current approach are:
      1. We issue one RPC per fragment per host. This adds up for large clusters and.or complex queries. We also ship redundant data, e.g., every fragment gets its own copy of the DescriptorTable.
      2. For each fragment, we start all its fragment instances in parallel, each RPC in its own thread. For large clusters, the number of threads can be substantial.

      Suggested improvements:
      1. We should group the fragment instances by host, and only issue one RPC per host to start up all fragments that host should run. On the remote side, care must be taken to start the fragments in the correct order. Initiating work in this way, will allow us to send only a single DescriptorTable that can be shared by all fragments instances on a particular host.
      2. Use a thread pool for starting remote work to avoid excessive thread use.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                henryr Henry Robinson
                Reporter:
                alex.behm Alexander Behm
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: