Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-10973

Empty scan nodes are scheduled to the (exclusive) coordinator

    XMLWordPrintableJSON

Details

    • ghx-label-10

    Description

      Currently fragments with scan nodes that have no scan ranges are scheduled to the coordinator, even if it is an exclusive coordinator:
      https://github.com/apache/impala/blob/master/be/src/scheduling/scheduler.cc#L805

      As "parent" fragments are often scheduled to be collocated with their children, the condition of "being scheduled to the coordinator" can spread through the plan tree.

      This can be disastrous to scalability in clusters with lot of executors but few coordinators and is also very counter-intuitive, as scanning an empty table shouldn't have a major effect on the query.

      To reproduce locally:
      bin/start-impala-cluster.py --use_exclusive_coordinators -c 1
      in Impala shell:
      select id from functional.alltypes;
      profile; – scan nodes will be scheduled to 2 hosts

      select f2 from functional.emptytable union all select id from functional.alltypes;
      profile; – scan nodes will be scheduled to 3 hosts

      Attachments

        Activity

          People

            csringhofer Csaba Ringhofer
            csringhofer Csaba Ringhofer
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: