Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5173

Crash when NestedLoopJoin has HashJoin feeding directly into it's right side

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • Impala 2.7.0, Impala 2.8.0, Impala 2.9.0
    • Impala 2.9.0
    • Backend
    • None

    Description

      The repro is:

      use tpch_parquet;
      set num_nodes=1;
      set max_block_mgr_memory=300m;
      select straight_join * 
      from (select * from orders limit 1) o,
         (select l1.* from lineitem l1 inner join lineitem l2 on l1.l_orderkey = l2.l_orderkey) v;
      

      Plan:

      +-------------------------------------------------------+
      | Explain String                                        |
      +-------------------------------------------------------+
      | Estimated Per-Host Requirements: Memory=0B VCores=0   |
      |                                                       |
      | PLAN-ROOT SINK                                        |
      | |                                                     |
      | 04:NESTED LOOP JOIN [CROSS JOIN]                      |
      | |  hosts=2 per-host-mem=unavailable                   |
      | |  tuple-ids=0,2,3 row-size=462B cardinality=6001215  |
      | |                                                     |
      | |--03:HASH JOIN [INNER JOIN]                          |
      | |  |  hash predicates: l1.l_orderkey = l2.l_orderkey  |
      | |  |  runtime filters: RF000 <- l2.l_orderkey         |
      | |  |  hosts=3 per-host-mem=unavailable                |
      | |  |  tuple-ids=2,3 row-size=271B cardinality=6001215 |
      | |  |                                                  |
      | |  |--02:SCAN HDFS [tpch_parquet.lineitem l2]         |
      | |  |     partitions=1/1 files=3 size=193.61MB         |
      | |  |     table stats: 6001215 rows total              |
      | |  |     column stats: all                            |
      | |  |     hosts=3 per-host-mem=unavailable             |
      | |  |     tuple-ids=3 row-size=8B cardinality=6001215  |
      | |  |                                                  |
      | |  01:SCAN HDFS [tpch_parquet.lineitem l1]            |
      | |     partitions=1/1 files=3 size=193.61MB            |
      | |     runtime filters: RF000 -> l1.l_orderkey         |
      | |     table stats: 6001215 rows total                 |
      | |     column stats: all                               |
      | |     hosts=3 per-host-mem=unavailable                |
      | |     tuple-ids=2 row-size=263B cardinality=6001215   |
      | |                                                     |
      | 00:SCAN HDFS [tpch_parquet.orders]                    |
      |    partitions=1/1 files=2 size=54.00MB                |
      |    table stats: 1500000 rows total                    |
      |    column stats: all                                  |
      |    limit: 1                                           |
      |    hosts=2 per-host-mem=unavailable                   |
      |    tuple-ids=0 row-size=191B cardinality=1            |
      +-------------------------------------------------------+
      
      

      Summary before crash:

      +---------------------+--------+----------+----------+--------+------------+-----------+---------------+--------------------------+
      | Operator            | #Hosts | Avg Time | Max Time | #Rows  | Est. #Rows | Peak Mem  | Est. Peak Mem | Detail                   |
      +---------------------+--------+----------+----------+--------+------------+-----------+---------------+--------------------------+
      | 04:NESTED LOOP JOIN | 1      | 0ns      | 0ns      | 0      | 6.00M      | 6.19 GB   | -1 B          | CROSS JOIN               |
      | |--03:HASH JOIN     | 1      | 4.17s    | 4.17s    | 13.28M | 6.00M      | 312.07 MB | -1 B          | INNER JOIN               |
      | |  |--02:SCAN HDFS  | 1      | 33.74ms  | 33.74ms  | 6.00M  | 6.00M      | 14.98 MB  | -1 B          | tpch_parquet.lineitem l2 |
      | |  01:SCAN HDFS     | 1      | 795.87ms | 795.87ms | 6.00M  | 6.00M      | 259.72 MB | -1 B          | tpch_parquet.lineitem l1 |
      | 00:SCAN HDFS        | 1      | 291.46us | 291.46us | 0      | 1          | 0 B       | -1 B          | tpch_parquet.orders      |
      +---------------------+--------+----------+----------+--------+------------+-----------+---------------+--------------------------+
      

      This DCHECK is hit:

      F0405 16:02:34.806744 21512 mem-tracker.cc:194] Check failed: consumption_->current_value() == 0 (17367040 vs. 0) Hash Join Builder (join_node_id=3)
          @          0x1396425  impala::MemTracker::~MemTracker()
          @          0x137ee3c  boost::checked_delete<>()
          @          0x137cd15  boost::scoped_ptr<>::~scoped_ptr()
          @          0x137d23f  boost::scoped_ptr<>::reset()
          @          0x1855be7  impala::DataSink::Close()
          @          0x184692d  impala::PhjBuilder::Close()
          @          0x17c1190  impala::PartitionedHashJoinNode::Close()
          @          0x1823edd  impala::BlockingJoinNode::ProcessBuildInputAsync()
          @          0x18276df  boost::_mfi::mf3<>::operator()()
          @          0x182758b  boost::_bi::list4<>::operator()<>()
          @          0x18272d7  boost::_bi::bind_t<>::operator()()
          @          0x1826fd0  boost::detail::function::void_function_obj_invoker0<>::invoke()
          @          0x137bd48  boost::function0<>::operator()()
          @          0x162e363  impala::Thread::SuperviseThread()
          @          0x1636d3e  boost::_bi::list4<>::operator()<>()
          @          0x1636c81  boost::_bi::bind_t<>::operator()()
          @          0x1636c44  boost::detail::thread_data<>::run()
          @          0x1af270a  thread_proxy
          @     0x7fc2538ff6ba  start_thread
          @     0x7fc25363582d  clone
      
      Hash Join Builder (join_node_id=3): Total=16.56 MB Peak=310.08 MB
      *** Check failure stack trace: ***
          @          0x28c208d  google::LogMessage::Fail()
          @          0x28c3932  google::LogMessage::SendToLog()
          @          0x28c1a67  google::LogMessage::Flush()
          @          0x28c502e  google::LogMessageFatal::~LogMessageFatal()
          @          0x13964c7  impala::MemTracker::~MemTracker()
          @          0x137ee3c  boost::checked_delete<>()
          @          0x137cd15  boost::scoped_ptr<>::~scoped_ptr()
          @          0x137d23f  boost::scoped_ptr<>::reset()
          @          0x1855be7  impala::DataSink::Close()
          @          0x184692d  impala::PhjBuilder::Close()
          @          0x17c1190  impala::PartitionedHashJoinNode::Close()
          @          0x1823edd  impala::BlockingJoinNode::ProcessBuildInputAsync()
          @          0x18276df  boost::_mfi::mf3<>::operator()()
          @          0x182758b  boost::_bi::list4<>::operator()<>()
          @          0x18272d7  boost::_bi::bind_t<>::operator()()
          @          0x1826fd0  boost::detail::function::void_function_obj_invoker0<>::invoke()
          @          0x137bd48  boost::function0<>::operator()()
          @          0x162e363  impala::Thread::SuperviseThread()
          @          0x1636d3e  boost::_bi::list4<>::operator()<>()
          @          0x1636c81  boost::_bi::bind_t<>::operator()()
          @          0x1636c44  boost::detail::thread_data<>::run()
          @          0x1af270a  thread_proxy
          @     0x7fc2538ff6ba  start_thread
          @     0x7fc25363582d  clone
      Picked up JAVA_TOOL_OPTIONS: -agentlib:jdwp=transport=dt_socket,address=30000,server=y,suspend=n 
      Wrote minidump to /home/tarmstrong/Impala/incubator-impala/logs/cluster/minidumps/impalad/25a8ecae-b4e1-9645-449882d5-58049f7f.dmp
      

      Backtrace:

      (gdb) bt
      #0  0x00007fc253564428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
      #1  0x00007fc25356602a in __GI_abort () at abort.c:89
      #2  0x00000000028cb614 in google::DumpStackTraceAndExit() ()
      #3  0x00000000028c208d in google::LogMessage::Fail() ()
      #4  0x00000000028c3932 in google::LogMessage::SendToLog() ()
      #5  0x00000000028c1a67 in google::LogMessage::Flush() ()
      #6  0x00000000028c502e in google::LogMessageFatal::~LogMessageFatal() ()
      #7  0x00000000013964c7 in impala::MemTracker::~MemTracker (this=0x9b1c780, __in_chrg=<optimized out>) at be/src/runtime/mem-tracker.cc:194
      #8  0x000000000137ee3c in boost::checked_delete<impala::MemTracker> (x=0x9b1c780) at /opt/Impala-Toolchain/boost-1.57.0-p1/include/boost/core/checked_delete.hpp:34
      #9  0x000000000137cd15 in boost::scoped_ptr<impala::MemTracker>::~scoped_ptr (this=0x7fc1b82607c0, __in_chrg=<optimized out>) at /opt/Impala-Toolchain/boost-1.57.0-p1/include/boost/smart_ptr/scoped_ptr.hpp:82
      #10 0x000000000137d23f in boost::scoped_ptr<impala::MemTracker>::reset (this=0x850d6e0, p=0x0) at /opt/Impala-Toolchain/boost-1.57.0-p1/include/boost/smart_ptr/scoped_ptr.hpp:88
      #11 0x0000000001855be7 in impala::DataSink::Close (this=0x850d6c0, state=0x81b1800) at be/src/exec/data-sink.cc:197
      #12 0x000000000184692d in impala::PhjBuilder::Close (this=0x850d6c0, state=0x81b1800) at be/src/exec/partitioned-hash-join-builder.cc:235
      #13 0x00000000017c1190 in impala::PartitionedHashJoinNode::Close (this=0x880ae00, state=0x81b1800) at be/src/exec/partitioned-hash-join-node.cc:248
      #14 0x0000000001823edd in impala::BlockingJoinNode::ProcessBuildInputAsync (this=0xda33e00, state=0x81b1800, build_sink=0x8d118e0, status=0x7fc1baa6ba00) at be/src/exec/blocking-join-node.cc:161
      #15 0x00000000018276df in boost::_mfi::mf3<void, impala::BlockingJoinNode, impala::RuntimeState*, impala::DataSink*, impala::Promise<impala::Status>*>::operator() (this=0x8e19b60, p=0xda33e00, a1=0x81b1800, a2=0x8d118e0,
          a3=0x7fc1baa6ba00) at /opt/Impala-Toolchain/boost-1.57.0-p1/include/boost/bind/mem_fn_template.hpp:393
      #16 0x000000000182758b in boost::_bi::list4<boost::_bi::value<impala::BlockingJoinNode*>, boost::_bi::value<impala::RuntimeState*>, boost::_bi::value<impala::DataSink*>, boost::_bi::value<impala::Promise<impala::Status>*> >::operator()<boost::_mfi::mf3<void, impala::BlockingJoinNode, impala::RuntimeState*, impala::DataSink*, impala::Promise<impala::Status>*>, boost::_bi::list0> (this=0x8e19b70, f=..., a=...)
          at /opt/Impala-Toolchain/boost-1.57.0-p1/include/boost/bind/bind.hpp:457
      #17 0x00000000018272d7 in boost::_bi::bind_t<void, boost::_mfi::mf3<void, impala::BlockingJoinNode, impala::RuntimeState*, impala::DataSink*, impala::Promise<impala::Status>*>, boost::_bi::list4<boost::_bi::value<impala::BlockingJoinNode*>, boost::_bi::value<impala::RuntimeState*>, boost::_bi::value<impala::DataSink*>, boost::_bi::value<impala::Promise<impala::Status>*> > >::operator() (this=0x8e19b60)
          at /opt/Impala-Toolchain/boost-1.57.0-p1/include/boost/bind/bind_template.hpp:20
      #18 0x0000000001826fd0 in boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, boost::_mfi::mf3<void, impala::BlockingJoinNode, impala::RuntimeState*, impala::DataSink*, impala::Promise<impala::Status>*>, boost::_bi::list4<boost::_bi::value<impala::BlockingJoinNode*>, boost::_bi::value<impala::RuntimeState*>, boost::_bi::value<impala::DataSink*>, boost::_bi::value<impala::Promise<impala::Status>*> > >, void>::invoke
          (function_obj_ptr=...) at /opt/Impala-Toolchain/boost-1.57.0-p1/include/boost/function/function_template.hpp:153
      #19 0x000000000137bd48 in boost::function0<void>::operator() (this=0x7fc1b8260d60) at /opt/Impala-Toolchain/boost-1.57.0-p1/include/boost/function/function_template.hpp:767
      #20 0x000000000162e363 in impala::Thread::SuperviseThread(std::string const&, std::string const&, boost::function<void ()>, impala::Promise<long>*) (name="build thread", category="NestedLoopJoinNode", functor=...,
          thread_started=0x7fc1baa6b7f0) at be/src/util/thread.cc:325
      #21 0x0000000001636d3e in boost::_bi::list4<boost::_bi::value<std::string>, boost::_bi::value<std::string>, boost::_bi::value<boost::function<void ()> >, boost::_bi::value<impala::Promise<long>*> >::operator()<void (*)(std::string const&, std::string const&, boost::function<void ()>, impala::Promise<long>*), boost::_bi::list0>(boost::_bi::type<void>, void (*&)(std::string const&, std::string const&, boost::function<void ()>, impala::Promise<long>*), boost::_bi::list0&, int) (this=0xda309c0, f=@0xda309b8: 0x162e09e <impala::Thread::SuperviseThread(std::string const&, std::string const&, boost::function<void ()>, impala::Promise<long>*)>, a=...)
          at /opt/Impala-Toolchain/boost-1.57.0-p1/include/boost/bind/bind.hpp:457
      #22 0x0000000001636c81 in boost::_bi::bind_t<void, void (*)(std::string const&, std::string const&, boost::function<void ()>, impala::Promise<long>*), boost::_bi::list4<boost::_bi::value<std::string>, boost::_bi::value<std::string>, boost::_bi::value<boost::function<void ()> >, boost::_bi::value<impala::Promise<long>*> > >::operator()() (this=0xda309b8) at /opt/Impala-Toolchain/boost-1.57.0-p1/include/boost/bind/bind_template.hpp:20
      #23 0x0000000001636c44 in boost::detail::thread_data<boost::_bi::bind_t<void, void (*)(std::string const&, std::string const&, boost::function<void ()>, impala::Promise<long>*), boost::_bi::list4<boost::_bi::value<std::string>, boost::_bi::value<std::string>, boost::_bi::value<boost::function<void ()> >, boost::_bi::value<impala::Promise<long>*> > > >::run() (this=0xda30800)
          at /opt/Impala-Toolchain/boost-1.57.0-p1/include/boost/thread/detail/thread.hpp:116
      #24 0x0000000001af270a in thread_proxy ()
      #25 0x00007fc2538ff6ba in start_thread (arg=0x7fc1b8261700) at pthread_create.c:333
      #26 0x00007fc25363582d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
      

      The root cause (I believe) is the nested loop join accumulating BufferedBlockMgr::Blocks attached to row batches.

      Attachments

        Activity

          People

            tarmstrong Tim Armstrong
            tarmstrong Tim Armstrong
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: