[IMPALA-1580] Optimize conversion of row batch to query result set - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Duplicate
Affects Version/s: Impala 2.0.1
Fix Version/s: None
Component/s: Perf Investigation
Labels:
- performance
- ramp-up

Target Version:

Product Backlog

Description

For simple queries that produce a large result set such as "select * from tpch.lineitem" the server execution time is limited by the time required to convert row batches (results in the internal structure) to query results (the structure to be sent to the client). The data conversion is the limiting factor in this case because the query plan execution happens in parallel.

Here are some data points from the profile of "select * from tpch.lineitem" using HS2 (this was taken using --exchg_node_buffer_size_bytes=2048576000 so the exchange node would never block because of a full buffer.). Beeswax takes even longer to convert the rows.

Query Timeline: 1m9s
Execution Profile – Total: 1s295ms
ClientFetchWaitTimer: 52s553ms
RowMaterializationTimer: 15s216ms
Coordinator Fragment F01:(Total: 1s092ms
Averaged Fragment F00:(Total: 5s608ms

So the "RowMaterializationTimer", which is actually conversion time, adds ~9 seconds or ~2x the plan execution time to the overall time.

Ideally the conversion time would be codegen'd but even without that there should be a lot of room for improvement by reducing function calls.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

select_lineitem.profile
05/Dec/14 21:49
18 kB
casey

Issue Links

duplicates

IMPALA-7477 Improve QueryResultSet interface to allow appending a batch of rows at a time

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: casey

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 05/Dec/14 21:49

Updated:: 16/Sep/19 23:07

Resolved:: 16/Sep/19 23:07