Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4280 Clean up tasks in Coordinator, QueryExecState and others.
  3. IMPALA-4925

Coordinator does not cancel fragments if query completes w/limit

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • Impala 2.8.0
    • Impala 2.10.0
    • Distributed Exec
    • None

    Description

      If a plan has a limit, the coordinator will eventually set Coordinator::returned_all_results_ once the limit has been hit. At this point, it should start to cancel fragment instances that are still running. This happens usually either through an explicit cancel RPC, or returning a non-OK status to the heartbeat ReportExecStatus() RPC. In the limit case, neither happen - the query status is not set to !ok() (because the query succeeded!), so there's no 'bad' status to propagate to the fragment instance.

      In many cases this doesn't matter because the cancellation propagates from the top down: the root instance will get closed and go away, and then any senders to that instance will notice and cancel themselves, and so on. But there are plan shapes that mean a lot of CPU time is wasted after the query should have finished, e.g.:

      with l as (select 1 from functional.alltypes group by month), r as
                (select count(*) from lineitem a CROSS JOIN lineitem b)
        SELECT * from l UNION ALL (select * from r) LIMIT 2

      This convoluted query illustrates the idea: table l is the left union child, and gets evaluated first. It produces more than two rows, so the limit gets hit. The right child, in the meantime, is evaluating the cross join before the aggregation, which is very cpu heavy. When the limit is hit, the query hangs (from the client's perspective), waiting for the right child to produce no results.

      The fix for this is easy: fragment instances should learn about query termination from ReportExecStatus() RPCs. If results_returned_ is true, the coordinator should return a non-OK status, causing instance tear-down next time the instance checks its cancellation state.

      Attachments

        Issue Links

          Activity

            People

              henryr Henry Robinson
              henryr Henry Robinson
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: