[IMPALA-2829] SEGV in AnalyticEvalNode touching NULL input_stream_ - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Blocker
Resolution: Fixed
Affects Version/s: Impala 2.3.0, Impala 2.5.0
Fix Version/s: Impala 2.5.0, Impala 2.3.2
Component/s: Backend
Labels:
- crash

Target Version:

Impala 2.5.0, Impala 2.3.2

Description

A crash was reported in the following stack:

Stack: [0x00007fe1c7c8b000,0x00007fe1c848c000],  sp=0x00007fe1c8489bd0,  free space=8186k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [impalad+0x128fb26]  impala::BufferedTupleStream::rows_returned() const+0xc
C  [impalad+0x12bd3b3]  impala::AnalyticEvalNode::GetNext(impala::RuntimeState*, impala::RowBatch*, bool*)+0x821
C  [impalad+0x115f478]  impala::PlanFragmentExecutor::GetNextInternal(impala::RowBatch**)+0xec
C  [impalad+0x115dc92]  impala::PlanFragmentExecutor::OpenInternal()+0x272
C  [impalad+0x115d958]  impala::PlanFragmentExecutor::Open()+0x39e
C  [impalad+0xf30d88]  impala::FragmentMgr::FragmentExecState::Exec()+0x26
C  [impalad+0xf293a8]  impala::FragmentMgr::FragmentExecThread(impala::FragmentMgr::FragmentExecState*)+0x4c

The issue may have been introduced in a recent fix for:
~~IMPALA-2378~~: Part 2, ~~IMPALA-2481~~: delete BufferedTupleStreams attached to batches
commit 916f3b29

I can reproduce this with the following query:

select max(t3.c1), max(t3.c2)
from (
  select
  avg( t1.timestamp_col )
    over (order by t1.id, t2.id rows between 5000 following and 50000 following) c1,
  avg( t2.timestamp_col )
    over (order by t1.id, t2.id rows between 5000 following and 50000 following) c2
  from alltypesagg t1 join alltypesagg t2 where t1.int_col = t2.int_col
) t3;

The issue has to do with allocated memory that gets passed to the output row batch. Normally memory gets allocated from a mempool and then transferred to the output row batch when it reaches 8mb. This may happen many times during the execution of the analytic node, and it works fine in the general case. However, when this transfer to the output row batch is supposed to happen at eos, we end up trying to do this transfer twice, which is where we end up touching a NULL pointer.

It shouldn't happen too frequently (none of our existing tests hit it), but it is unfortunately hard to predict when this will happen because it really depends on the query and the data.

There isn't an easy general workaround, but small changes that affect the cardinality of the data or the output tuple size of the analytic eval node may change when the data transfer happens and thus avoiding the crash.

Attachments

Issue Links

is duplicated by

IMPALA-5321 impalad crash when i search sum data from table

Resolved

Activity

People

Assignee:: Matthew Jacobs

Reporter:: Matthew Jacobs

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 09/Jan/16 22:54

Updated:: 13/May/17 17:32

Resolved:: 12/Jan/16 07:18