IMPALA-8818, I found that IMPALA-8780 does not always cause all non-coordinator fragments to shutdown. In certain setups, TopN queries (select * from [table] order by [col] limit [limit]) where all results are successfully spooled, still keep non-coordinator fragments alive.
The issue is that sometimes the DATASTREAM SINK for the TopN <-- Scan Node fragment ends up blocking waiting for a response to a TransmitData() RPC. This prevents the fragment from shutting down.
I haven't traced the issue exactly, but what I think is happening is that the MERGING-EXCHANGE operator in the coordinator fragment hits eos whenever it has received enough rows to reach the limit defined in the query, which could occur before the DATASTREAM SINK sends all the rows from the TopN / Scan Node fragment.
So the TopN / Scan Node fragments end up hanging until they are explicitly closed.
The fix is to close the ExecNode tree in FragmentInstanceState as eagerly as possible. Moving the close call to before the call to DataSink::FlushFinal fixes the issue. It has the added benefit that it shuts down and releases all ExecNode resources as soon as it can. When result spooling is enabled, this is particularly important because FlushFinal might block until the consumer reads all rows.