[IMPALA-9127] Clean up probe-side state machine in hash join - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: Impala 3.4.0
Component/s: Backend
Labels:
- multithreading

Epic Link:
Impala Multi-threaded query execution
Target Version:

Product Backlog
Epic Color:
ghx-label-6

Description

There's an implicit state machine in the main loop in PartitionedHashJoinNode::GetNext() https://github.com/apache/impala/blob/eea617b/be/src/exec/partitioned-hash-join-node.cc#L510

The state is implicitly defined based on the following conditions:

!output_build_partitions_.empty() -> "outputting build rows after probing"
builder_->null_aware_partition() == NULL -> "eos, because this the null-aware partition is processed after all other partitions"
null_probe_output_idx_ >= 0 -> "null probe rows being processed"
output_null_aware_probe_rows_running_ -> "null-aware partition being processed"
probe_batch_pos_ != -1 -> "processing probe batch"
builder_->num_hash_partitions() != 0 -> "have active hash partitions that are being probed"
spilled_partitions_.empty() -> "no more spilled partitions"

I think this would be a lot easier to follow if the state machine was explicit and documented, and would make separating out the build side of a spilling hash join easier to get right.

Attachments

Issue Links

is depended upon by

IMPALA-4224 Add backend support for join build sinks in parallel plans

Resolved

Activity

People

Assignee:: Tim Armstrong

Reporter:: Tim Armstrong

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 05/Nov/19 19:53

Updated:: 26/Nov/19 17:15

Resolved:: 26/Nov/19 17:15