Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Duplicate
-
Impala 2.0.1
-
None
Description
For simple queries that produce a large result set such as "select * from tpch.lineitem" the server execution time is limited by the time required to convert row batches (results in the internal structure) to query results (the structure to be sent to the client). The data conversion is the limiting factor in this case because the query plan execution happens in parallel.
Here are some data points from the profile of "select * from tpch.lineitem" using HS2 (this was taken using --exchg_node_buffer_size_bytes=2048576000 so the exchange node would never block because of a full buffer.). Beeswax takes even longer to convert the rows.
- Query Timeline: 1m9s
- Execution Profile – Total: 1s295ms
- ClientFetchWaitTimer: 52s553ms
- RowMaterializationTimer: 15s216ms
- Coordinator Fragment F01:(Total: 1s092ms
- Averaged Fragment F00:(Total: 5s608ms
So the "RowMaterializationTimer", which is actually conversion time, adds ~9 seconds or ~2x the plan execution time to the overall time.
Ideally the conversion time would be codegen'd but even without that there should be a lot of room for improvement by reducing function calls.
Attachments
Attachments
Issue Links
- duplicates
-
IMPALA-7477 Improve QueryResultSet interface to allow appending a batch of rows at a time
- Resolved