Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
ghx-label-7
Description
There are some obvious inefficiencies with how the query state record works:
- We do an unnecessary copy of the archive string when adding it to the query log
https://github.com/apache/impala/blob/79aae231443a305ce8503dbc7b4335e8ae3f3946/be/src/service/impala-server.cc#L1812. - We eagerly convert the profile to text and JSON, when in many cases they won't be needed - https://github.com/apache/impala/blob/79aae231443a305ce8503dbc7b4335e8ae3f3946/be/src/service/impala-server.cc#L1839 . I think it is generally rare for more than one profile format to be downloaded from the web UI. I know of tools that scrape the thrift profile, but the human-readable version would usually only be consumed by humans. We could avoid this by only storing the thrift representation of the profile, then reconstituting the other representations from thrift if requested.
- After ComputeExecSummary(), the profile shouldn't change, but we'll regenerate the thrift representation for every web request to get the encoded. This may waste a lot of CPU for tools scraping the profiles.