Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-1929

Crash because PHJ::NextSpilledProbeRowBatch() tries to use a NULL hash_tbl

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • Impala 2.2
    • Impala 2.3.0
    • None
    • None

    Description

      While running the mem leak test with 4 concurrent clients using the release bits an impalad crashed at PHJ::NextSpilledProbeRowBatch() with the following stack trace. This can happen in case of RIGHT_OUTER, RIGHT_ANTI and FULL_OUTER joins when the hash table of the partition is NULL.

      #4  0x00007ff399ce496f in JVM_handle_linux_signal () from /usr/java/jdk1.7.0_67-cloudera/jre/lib/amd64/server/libjvm.so
      #5  <signal handler called>
      #6  NextBucket (this=0x0, ctx=0xb8fd110) at /usr/src/debug/impala-2.2.0-cdh5.4.1-SNAPSHOT/be/src/exec/hash-table.inline.h:80
      #7  impala::HashTable::FirstUnmatched (this=0x0, ctx=0xb8fd110) at /usr/src/debug/impala-2.2.0-cdh5.4.1-SNAPSHOT/be/src/exec/hash-table.inline.h:62
      #8  0x0000000000c16cc1 in impala::PartitionedHashJoinNode::NextSpilledProbeRowBatch (this=0x599d6000, state=<value optimized out>, out_batch=0x7fe8e9cbfd70) at /usr/src/debug/impala-2.2.0-cdh5.4.1-SNAPSHOT/be/src/exec/partitioned-hash-join-node.cc:640
      #9  0x0000000000c1f62f in impala::PartitionedHashJoinNode::GetNext (this=0x599d6000, state=0x15980700, out_batch=0x7fe8e9cbfd70, eos=0x7fe8e9cbfedf) at /usr/src/debug/impala-2.2.0-cdh5.4.1-SNAPSHOT/be/src/exec/partitioned-hash-join-node.cc:814
      

      The code has a dcheck that the hash_tbl is not null, which we didn't hit because we were using the release bits.

          // Done with this partition.                                                                                                         
          if (join_op_ == TJoinOp::RIGHT_OUTER_JOIN || join_op_ == TJoinOp::RIGHT_ANTI_JOIN ||
              join_op_ == TJoinOp::FULL_OUTER_JOIN) {
            // In case of right-outer, right-anti and full-outer joins, we move this partition                                                 
            // to the list of partitions that we need to output their unmatched build rows.                                                    
            DCHECK(output_build_partitions_.empty());
            DCHECK(input_partition_->hash_tbl_.get() != NULL);
      <== Crash happened below because hash_tbl_ == NULL ==> 
            hash_tbl_iterator_ =
                input_partition_->hash_tbl_->FirstUnmatched(ht_ctx_.get());
            output_build_partitions_.push_back(input_partition_);
          } else {
            // In any other case, just close the input partition.                                                                              
            input_partition_->Close(out_batch);
            input_partition_ = NULL;
          }
          current_probe_row_ = NULL;
          probe_batch_pos_ = -1;
        }
      

      Attachments

        Issue Links

          Activity

            People

              ippokratis Ippokratis Pandis
              ippokratis Ippokratis Pandis
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: