Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Duplicate
-
Impala 2.6.0
-
None
Description
The nightly stress test caused and detected an impalad crash.
There's a full cluster collection of logs, binaries, and core on impala-desktop, in the dev home directory with this Jira ID as the name.
Note that this test has been running for weeks now and this is the first detected crash.
Here's the backtrace:
(gdb) bt #0 0x0000003fa8232625 in raise () from /lib64/libc.so.6 #1 0x0000003fa8233e05 in abort () from /lib64/libc.so.6 #2 0x00007f7f2b0edc55 in os::abort(bool) () from /opt/toolchain/sun-jdk-64bit-1.7.0.75/jre/lib/amd64/server/libjvm.so #3 0x00007f7f2b26fcd7 in VMError::report_and_die() () from /opt/toolchain/sun-jdk-64bit-1.7.0.75/jre/lib/amd64/server/libjvm.so #4 0x00007f7f2b0f2b6f in JVM_handle_linux_signal () from /opt/toolchain/sun-jdk-64bit-1.7.0.75/jre/lib/amd64/server/libjvm.so #5 <signal handler called> #6 0x0000000000c423d9 in SetNull (this=0x7f7c362e0380, state=0x156eff000) at /usr/src/debug/impala-2.6.0-cdh5.8.0-SNAPSHOT/be/src/runtime/tuple.h:124 #7 impala::UnnestNode::Open (this=0x7f7c362e0380, state=0x156eff000) at /usr/src/debug/impala-2.6.0-cdh5.8.0-SNAPSHOT/be/src/exec/unnest-node.cc:100 #8 0x0000000000c52e8f in impala::BlockingJoinNode::Open (this=0x7f6d2dbd0f80, state=0x156eff000) at /usr/src/debug/impala-2.6.0-cdh5.8.0-SNAPSHOT/be/src/exec/blocking-join-node.cc:209 #9 0x0000000000c121da in impala::NestedLoopJoinNode::Open (this=0x7f6d2dbd0f80, state=0x156eff000) at /usr/src/debug/impala-2.6.0-cdh5.8.0-SNAPSHOT/be/src/exec/nested-loop-join-node.cc:61 #10 0x0000000000c3addb in impala::SubplanNode::GetNext (this=0x7f7677b0adc0, state=0x156eff000, row_batch=0x7f76145da0a0, eos=0x7f6d2dbd0e61) at /usr/src/debug/impala-2.6.0-cdh5.8.0-SNAPSHOT/be/src/exec/subplan-node.cc:124 #11 0x0000000000c52f29 in impala::BlockingJoinNode::Open (this=0x7f6d2dbd0d00, state=0x156eff000) at /usr/src/debug/impala-2.6.0-cdh5.8.0-SNAPSHOT/be/src/exec/blocking-join-node.cc:220 #12 0x0000000000c121da in impala::NestedLoopJoinNode::Open (this=0x7f6d2dbd0d00, state=0x156eff000) at /usr/src/debug/impala-2.6.0-cdh5.8.0-SNAPSHOT/be/src/exec/nested-loop-join-node.cc:61 #13 0x0000000000c3addb in impala::SubplanNode::GetNext (this=0x7f7351b5b760, state=0x156eff000, row_batch=0x7f76145db840, eos=0x7f734f769361) at /usr/src/debug/impala-2.6.0-cdh5.8.0-SNAPSHOT/be/src/exec/subplan-node.cc:124 #14 0x0000000000c22ace in impala::PartitionedHashJoinNode::NextProbeRowBatch (this=0x7f734f769200, state=0x156eff000, out_batch=0x7f71fdf64360) at /usr/src/debug/impala-2.6.0-cdh5.8.0-SNAPSHOT/be/src/exec/partitioned-hash-join-node.cc:754 #15 0x0000000000c2a57d in impala::PartitionedHashJoinNode::GetNext (this=Unhandled dwarf expression opcode 0xf3 ) at /usr/src/debug/impala-2.6.0-cdh5.8.0-SNAPSHOT/be/src/exec/partitioned-hash-join-node.cc:981 #16 0x0000000000c22ace in impala::PartitionedHashJoinNode::NextProbeRowBatch (this=0x7f78b6957b00, state=0x156eff000, out_batch=0x7f71fdf65d40) at /usr/src/debug/impala-2.6.0-cdh5.8.0-SNAPSHOT/be/src/exec/partitioned-hash-join-node.cc:754 #17 0x0000000000c2a57d in impala::PartitionedHashJoinNode::GetNext (this=Unhandled dwarf expression opcode 0xf3 ) at /usr/src/debug/impala-2.6.0-cdh5.8.0-SNAPSHOT/be/src/exec/partitioned-hash-join-node.cc:981 #18 0x0000000000c22ace in impala::PartitionedHashJoinNode::NextProbeRowBatch (this=0x16815e880, state=0x156eff000, out_batch=0x7f71331f8760) at /usr/src/debug/impala-2.6.0-cdh5.8.0-SNAPSHOT/be/src/exec/partitioned-hash-join-node.cc:754 #19 0x0000000000c2a57d in impala::PartitionedHashJoinNode::GetNext (this=Unhandled dwarf expression opcode 0xf3 ) at /usr/src/debug/impala-2.6.0-cdh5.8.0-SNAPSHOT/be/src/exec/partitioned-hash-join-node.cc:981 #20 0x0000000000c162ee in impala::PartitionedAggregationNode::GetRowsStreaming (this=Unhandled dwarf expression opcode 0xf3 ) at /usr/src/debug/impala-2.6.0-cdh5.8.0-SNAPSHOT/be/src/exec/partitioned-aggregation-node.cc:483 #21 0x0000000000c1b801 in impala::PartitionedAggregationNode::GetNext (this=0x7f7c01478000, state=0x156eff000, row_batch=0x7f7a52862120, eos=0x7f7a34057099) at /usr/src/debug/impala-2.6.0-cdh5.8.0-SNAPSHOT/be/src/exec/partitioned-aggregation-node.cc:385 #22 0x0000000000d309cb in impala::PlanFragmentExecutor::GetNextInternal (this=0x7f7a34056f70, batch=0x7f73be176040) at /usr/src/debug/impala-2.6.0-cdh5.8.0-SNAPSHOT/be/src/runtime/plan-fragment-executor.cc:492 #23 0x0000000000d30f5f in impala::PlanFragmentExecutor::OpenInternal (this=0x7f7a34056f70) at /usr/src/debug/impala-2.6.0-cdh5.8.0-SNAPSHOT/be/src/runtime/plan-fragment-executor.cc:365 #24 0x0000000000d3175b in impala::PlanFragmentExecutor::Open (this=0x7f7a34056f70) at /usr/src/debug/impala-2.6.0-cdh5.8.0-SNAPSHOT/be/src/runtime/plan-fragment-executor.cc:328 #25 0x0000000000ad7608 in impala::FragmentMgr::FragmentExecState::Exec (this=0x7f7a34056d00) at /usr/src/debug/impala-2.6.0-cdh5.8.0-SNAPSHOT/be/src/service/fragment-exec-state.cc:54 #26 0x0000000000acefca in impala::FragmentMgr::FragmentThread (this=0xdb0fc80, fragment_instance_id=Unhandled dwarf expression opcode 0xf3 ) at /usr/src/debug/impala-2.6.0-cdh5.8.0-SNAPSHOT/be/src/service/fragment-mgr.cc:86 #27 0x0000000000ad02da in operator() (function_obj_ptr=Unhandled dwarf expression opcode 0xf3 ) at /usr/src/debug/impala-2.6.0-cdh5.8.0-SNAPSHOT/toolchain/boost-1.57.0/include/boost/bind/mem_fn_template.hpp:165 #28 operator()<boost::_mfi::mf1<void, impala::FragmentMgr, impala::TUniqueId>, boost::_bi::list0> (function_obj_ptr=Unhandled dwarf expression opcode 0xf3 ) at /usr/src/debug/impala-2.6.0-cdh5.8.0-SNAPSHOT/toolchain/boost-1.57.0/include/boost/bind/bind.hpp:313 #29 operator() (function_obj_ptr=Unhandled dwarf expression opcode 0xf3 ) at /usr/src/debug/impala-2.6.0-cdh5.8.0-SNAPSHOT/toolchain/boost-1.57.0/include/boost/bind/bind_template.hpp:20 #30 boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, boost::_mfi::mf1<void, impala::FragmentMgr, impala::TUniqueId>, boost::_bi::list2<boost::_bi::value<impala::FragmentMgr*>, boost::_bi::value<impala::TUniqueId> > >, void>::invoke (function_obj_ptr=Unhandled dwarf expression opcode 0xf3 ) at /usr/src/debug/impala-2.6.0-cdh5.8.0-SNAPSHOT/toolchain/boost-1.57.0/include/boost/function/function_template.hpp:153 #31 0x0000000000b71dd7 in operator() (name=Unhandled dwarf expression opcode 0xf3 ) at /usr/src/debug/impala-2.6.0-cdh5.8.0-SNAPSHOT/toolchain/boost-1.57.0/include/boost/function/function_template.hpp:767 #32 impala::Thread::SuperviseThread (name=Unhandled dwarf expression opcode 0xf3 ) at /usr/src/debug/impala-2.6.0-cdh5.8.0-SNAPSHOT/be/src/util/thread.cc:316 #33 0x0000000000b72714 in operator()<void (*)(const std::basic_string<char>&, const std::basic_string<char>&, boost::function<void()>, impala::Promise<long int>*), boost::_bi::list0> (this=0x188d3cc00) at /usr/src/debug/impala-2.6.0-cdh5.8.0-SNAPSHOT/toolchain/boost-1.57.0/include/boost/bind/bind.hpp:457 #34 operator() (this=0x188d3cc00) at /usr/src/debug/impala-2.6.0-cdh5.8.0-SNAPSHOT/toolchain/boost-1.57.0/include/boost/bind/bind_template.hpp:20 #35 boost::detail::thread_data<boost::_bi::bind_t<void, void (*)(const std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, const std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, boost::function<void()>, impala::Promise<long int>*), boost::_bi::list4<boost::_bi::value<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<boost::function<void()> >, boost::_bi::value<impala::Promise<long int>*> > > >::run(void) (this=0x188d3cc00) at /usr/src/debug/impala-2.6.0-cdh5.8.0-SNAPSHOT/toolchain/boost-1.57.0/include/boost/thread/detail/thread.hpp:116 #36 0x0000000000dc02da in thread_proxy () #37 0x0000003fa86079d1 in start_thread () from /lib64/libpthread.so.0 #38 0x0000003fa82e88fd in clone () from /lib64/libc.so.6 (gdb)
Attachments
Issue Links
- duplicates
-
IMPALA-3528 Memory of scratch batch should be transferred when closing a Parquet scanner thread.
- Resolved
The problem is that it's trying to dereference a bad tuple:
The pointer is bad, and it comes from the subplan's input row.
The subplan just copies the input row from its input batch.
The failing query was TPC-H nested query 7:
This has the plan (in the planner tests anyway):
So it seems like the bad row is somehow being produced by the parquet scan node.