Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-9127

Clean up probe-side state machine in hash join

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • Impala 3.4.0
    • Backend

    Description

      There's an implicit state machine in the main loop in PartitionedHashJoinNode::GetNext() https://github.com/apache/impala/blob/eea617b/be/src/exec/partitioned-hash-join-node.cc#L510

      The state is implicitly defined based on the following conditions:

      • !output_build_partitions_.empty() -> "outputting build rows after probing"
      • builder_->null_aware_partition() == NULL -> "eos, because this the null-aware partition is processed after all other partitions"
      • null_probe_output_idx_ >= 0 -> "null probe rows being processed"
      • output_null_aware_probe_rows_running_ -> "null-aware partition being processed"
      • probe_batch_pos_ != -1 -> "processing probe batch"
      • builder_->num_hash_partitions() != 0 -> "have active hash partitions that are being probed"
      • spilled_partitions_.empty() -> "no more spilled partitions"

      I think this would be a lot easier to follow if the state machine was explicit and documented, and would make separating out the build side of a spilling hash join easier to get right.

      Attachments

        Issue Links

          Activity

            People

              tarmstrong Tim Armstrong
              tarmstrong Tim Armstrong
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: