Hue added a feature where after a user runs a query in Impala, we check the Query Profile (from the ImpalaD Web UI) for the RowsProduced statistic (from the Coordinator Fragment) and report that back as the total rows returned.
We're noticing that for some long running queries, the RowsProduced will be incorrect (reporting 4 despite getting 198 rows) right after the query is complete, but will be correct a few seconds later (validated by checking the query profile manually). We discovered that by adding a latency of a few seconds, we can usually get the correct RowsProduced.
But I was wondering if there's something smarter we can do, by checking either a value in the query profile itself, or somewhere else. We tried checking the hasResults value on the Thrift result handle as well as the status of the operation handle, but unfortunately these don't seem to have any effect (i.e. - they can be True or SUCCESSFUL even though the query profile doesn't have the right RowsProduced number).
Can something be added to the Query Profile itself to indicate that the RowsProduced is correct?
EDIT: Even though the original intent to guarantee that the value for RowsProduced was final by relying on profile finalization is not the right way to go as documented in the discussion below; it still makes sense to add a profile finalization counter to indicate that the final update has been recieved from the last fragment.