[IMPALA-11453] Add option to run-workload.py to have warm-up runs of query - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: Impala 4.2.0
Fix Version/s: None
Component/s: Infrastructure
Labels:
None

Epic Color:
ghx-label-5

Description

bin/run-workload.py has an option to explain the query before running it the first time. This gets the metadata loading out of the way so that it doesn't impact the first query time.

It would be useful to add another option that runs the query a couple times to warm up any caches before starting measurement. This would reduce variation due to the data not being in OS buffer caches, etc.

In my runs of perf-AB-test, the first run of a query sometimes shows this difference (for either A or B):

Run 1-3:
22:34:36 | TPCH-Q1  | 2022-07-20 04:16:12 | 7.52           | 1         |
22:34:36 | TPCH-Q1  | 2022-07-20 04:16:20 | 4.82           | 1         |
22:34:36 | TPCH-Q1  | 2022-07-20 04:16:25 | 5.04           | 1         |

Run 1-3:
22:34:36 | TPCH-Q11 | 2022-07-20 04:23:21 | 1.12           | 1         |
22:34:36 | TPCH-Q11 | 2022-07-20 04:23:23 | 0.93           | 1         |
22:34:36 | TPCH-Q11 | 2022-07-20 04:23:23 | 0.97           | 1         |

Run 1-3:
22:34:36 | TPCH-Q12 | 2022-07-20 04:24:13 | 2.23           | 1         |
22:34:36 | TPCH-Q12 | 2022-07-20 04:24:15 | 1.88           | 1         |
22:34:36 | TPCH-Q12 | 2022-07-20 04:24:17 | 1.78           | 1         |

If we ran the query a couple times before starting recordings, it would be a more consistent benchmark. This seems a useful setting to use for single_node_perf_run.py.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Joe McDonnell

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 22/Jul/22 16:27

Updated:: 22/Jul/22 19:37