Details
-
Task
-
Status: Resolved
-
Minor
-
Resolution: Won't Fix
-
Impala 2.2
-
None
Description
Investigate increasing batch size for better performance.
Initial results
Query | 10,000,000 | 1,000,000 | 100,000 | 10,000 | 1,000 | 100 | 10 |
---|---|---|---|---|---|---|---|
broadcast_join_1 | 5 | 4 | 4 | 4 | 3 | 3 | 33 |
broadcast_join_2 | 9 | 7 | 8 | 8 | 12 | 18 | 96 |
broadcast_join_3 | 55 | 63 | 67 | 73 | 81 | 133 | 556 |
exchange_broadcast | 115.59 | 118 | 119 | 149 | 285 | 1,193 | |
exchange_shuffle | 196 | 187 | 186 | 191 | 191.81 | ||
filter_bigint_non_selective | 9 | 6 | 6 | 6 | 7 | 31 | 275 |
filter_bigint_selective | 3 | 3 | 3 | 3 | 3 | 3 | 13 |
filter_decimal_non_selective | 3 | 3 | 3 | 3 | 3 | 9 | 65 |
filter_decimal_selective | 3 | 3 | 3 | 3 | 3 | 10 | 46 |
filter_string_non_selective | 2 | 2 | 2 | 2 | 3 | 17 | 143 |
filter_string_selective | 2 | 2 | 2 | 2 | 2 | 7 | 34 |
groupBy_bigint_highndv | 58 | 55 | 55 | 58 | 55 | 69 | 254 |
groupBy_bigint_lowndv | 13 | 9 | 9 | 9 | 10 | 26 | 202 |
groupBy_decimal_highndv | 116 | 87 | 81 | 105 | 102 | 103 | 257 |
groupBy_decimal_lowndv | 30 | 27 | 28 | 29 | 33 | 45 | 217 |
groupBy_spilling | 566 | 534 | 527 | 523 | 546 | ||
insert_partitioned | 392 | 383 | 375 | 385 | 483 | 451 | 486 |
insert | 392 | 383 | 375 | 385 | 483 | 451 | 486 |
orderby_all | 158 | 176 | 173 | 191 | 323 | ||
orderby_bigint | 30 | 34 | 34 | 35 | 34 | 49 | 281 |
shuffle_join_one_to_many_string_with_groupby | 554 | 568 | 613 | 561 | 549 | 577 | 739 |
shuffle_join_union_all_with_groupby | 97 | 109 | 122 | 119 | 262 |
Attachments
Issue Links
- relates to
-
IMPALA-2403 Replace BlockingQueue with a concurrent version so that readers and writers don't block
- Resolved
-
IMPALA-2399 Cleanup/rethink QueryMaintenance() calls in the BE.
- Open
+1, I've seen good speedups on Kudu queries with larger batch sizes as well. We spend a lot of time in a few areas which are per-batch:
If bigger row batches have some subtle issues, maybe attacking the above areas would help narrow the gap a bit.