Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-9127

Clean up probe-side state machine in hash join

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: Impala 3.4.0
    • Component/s: Backend
    • Labels:

      Description

      There's an implicit state machine in the main loop in PartitionedHashJoinNode::GetNext() https://github.com/apache/impala/blob/eea617b/be/src/exec/partitioned-hash-join-node.cc#L510

      The state is implicitly defined based on the following conditions:

      • !output_build_partitions_.empty() -> "outputting build rows after probing"
      • builder_->null_aware_partition() == NULL -> "eos, because this the null-aware partition is processed after all other partitions"
      • null_probe_output_idx_ >= 0 -> "null probe rows being processed"
      • output_null_aware_probe_rows_running_ -> "null-aware partition being processed"
      • probe_batch_pos_ != -1 -> "processing probe batch"
      • builder_->num_hash_partitions() != 0 -> "have active hash partitions that are being probed"
      • spilled_partitions_.empty() -> "no more spilled partitions"

      I think this would be a lot easier to follow if the state machine was explicit and documented, and would make separating out the build side of a spilling hash join easier to get right.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                tarmstrong Tim Armstrong
                Reporter:
                tarmstrong Tim Armstrong
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: