Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-2564

Introduce mechanism to limit query fan-out

    XMLWordPrintableJSON

Details

    Description

      The target use case is small queries on large clusters.

      Today Impala schedules queries on all Impalad instances regardless of how much data each Impalad would read, this results in spreading the work too thin between nodes and exposes undesired scalability issues.

      The proposal is to introduce a parameter that controls the Min/Max amount of data read by a single Impala instance.
      The SimpleScheduler would combine several splits together in order to satisfy the Min size requirements for a single Impalad before moving on the to the next node.

      Attachments

        Activity

          People

            Unassigned Unassigned
            mmokhtar Mostafa Mokhtar
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: