Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
Impala 2.2
-
None
-
None
Description
While running the mem leak test with 4 concurrent clients using the release bits an impalad crashed at PHJ::NextSpilledProbeRowBatch() with the following stack trace. This can happen in case of RIGHT_OUTER, RIGHT_ANTI and FULL_OUTER joins when the hash table of the partition is NULL.
#4 0x00007ff399ce496f in JVM_handle_linux_signal () from /usr/java/jdk1.7.0_67-cloudera/jre/lib/amd64/server/libjvm.so #5 <signal handler called> #6 NextBucket (this=0x0, ctx=0xb8fd110) at /usr/src/debug/impala-2.2.0-cdh5.4.1-SNAPSHOT/be/src/exec/hash-table.inline.h:80 #7 impala::HashTable::FirstUnmatched (this=0x0, ctx=0xb8fd110) at /usr/src/debug/impala-2.2.0-cdh5.4.1-SNAPSHOT/be/src/exec/hash-table.inline.h:62 #8 0x0000000000c16cc1 in impala::PartitionedHashJoinNode::NextSpilledProbeRowBatch (this=0x599d6000, state=<value optimized out>, out_batch=0x7fe8e9cbfd70) at /usr/src/debug/impala-2.2.0-cdh5.4.1-SNAPSHOT/be/src/exec/partitioned-hash-join-node.cc:640 #9 0x0000000000c1f62f in impala::PartitionedHashJoinNode::GetNext (this=0x599d6000, state=0x15980700, out_batch=0x7fe8e9cbfd70, eos=0x7fe8e9cbfedf) at /usr/src/debug/impala-2.2.0-cdh5.4.1-SNAPSHOT/be/src/exec/partitioned-hash-join-node.cc:814
The code has a dcheck that the hash_tbl is not null, which we didn't hit because we were using the release bits.
// Done with this partition. if (join_op_ == TJoinOp::RIGHT_OUTER_JOIN || join_op_ == TJoinOp::RIGHT_ANTI_JOIN || join_op_ == TJoinOp::FULL_OUTER_JOIN) { // In case of right-outer, right-anti and full-outer joins, we move this partition // to the list of partitions that we need to output their unmatched build rows. DCHECK(output_build_partitions_.empty()); DCHECK(input_partition_->hash_tbl_.get() != NULL); <== Crash happened below because hash_tbl_ == NULL ==> hash_tbl_iterator_ = input_partition_->hash_tbl_->FirstUnmatched(ht_ctx_.get()); output_build_partitions_.push_back(input_partition_); } else { // In any other case, just close the input partition. input_partition_->Close(out_batch); input_partition_ = NULL; } current_probe_row_ = NULL; probe_batch_pos_ = -1; }
Attachments
Issue Links
- relates to
-
IMPALA-2168 SEGV in BufferedTupleStream::num_rows() in a query with very large, spilling ROJ
- Resolved