There's an implicit state machine in the main loop in PartitionedHashJoinNode::GetNext() https://github.com/apache/impala/blob/eea617b/be/src/exec/partitioned-hash-join-node.cc#L510
The state is implicitly defined based on the following conditions:
- !output_build_partitions_.empty() -> "outputting build rows after probing"
- builder_->null_aware_partition() == NULL -> "eos, because this the null-aware partition is processed after all other partitions"
- null_probe_output_idx_ >= 0 -> "null probe rows being processed"
- output_null_aware_probe_rows_running_ -> "null-aware partition being processed"
- probe_batch_pos_ != -1 -> "processing probe batch"
- builder_->num_hash_partitions() != 0 -> "have active hash partitions that are being probed"
- spilled_partitions_.empty() -> "no more spilled partitions"
I think this would be a lot easier to follow if the state machine was explicit and documented, and would make separating out the build side of a spilling hash join easier to get right.