In a production service with a large TDB store (around 500MT) we find that some complex queries evade the query timeouts (set to 90s first result, 120s total) and then run for hours soaking up all available CPU cores. While the queries show no clear pattern, and it has been hard replicate in a controlled setting, we do now have one example which is expressible as a test case. See attached.
The behaviour is that the abort() call from the alarm timeout is received by QueryExecDataset before there is an iterator to cancel - the QueryExecDataset instance is deep in getPlan() which itself executes part of the query. In the specific example it's OpSlice which is iterating through the offset while still in the planning phase. Though not queries which cause this sort of behaviour use offsets.
Sorry but have no PR to offer at this stage. Have looked at whether it's possible to have getPlan() return some future or deferrable plan so that the top level exec has a handle on something that it can abort. However, the changes looks far reaching and I don't yet have a satisfactory approach to offer.