Three related things can be improved fairly easily in QueryExecState. I found these when debugging hangs in the debug web pages during query planning.
1. Don't hold the lock during GetExecRequest()
Since planning can take a long time, the fact that ImpalaServer holds (*exec_state)->lock() during GetExecRequest() can lead to problems when any other process (like the client, or a debug page) tries to read the exec state object.
There's no real reason to hold the exec state lock - the only reason given in the code is to make sure that exec state can't be read before the result metadata is set. But we can achieve that a different way by ensuring that all callers of result_metadata() wait for planning to be finished first.
2. Remove path in GetQueryExecState() that can take lock
It's confusing to have a path in ImpalaServer::GetQueryExecState() that allows callers to have the exec state returned with its lock held. The reason we might have this path is to ensure that there's no way that an exec state can be removed from the exec state map once it has been found - for example, if the query was cancelled after it was looked up in the exec state, but before it was returned to the caller. It's easier for callers to check that property explicitly after taking the lock for themselves.
3. Simplify implementation of QueryExecState::Wait()
QueryExecState::Wait() should use promises to have callers coordinate on the wait-thread's completion, rather than the more complex logic based on condvars that's there now. (There's more scope to improve the waiting path, but this is an incremental change).