Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-2564

Introduce mechanism to limit query fan-out

    Details

      Description

      The target use case is small queries on large clusters.

      Today Impala schedules queries on all Impalad instances regardless of how much data each Impalad would read, this results in spreading the work too thin between nodes and exposes undesired scalability issues.

      The proposal is to introduce a parameter that controls the Min/Max amount of data read by a single Impala instance.
      The SimpleScheduler would combine several splits together in order to satisfy the Min size requirements for a single Impalad before moving on the to the next node.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              mmokhtar Mostafa Mokhtar
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: