Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-2375

Fix issues with the legacy join and agg nodes using --enable_partitioned_hash_join=false and --enable_partitioned_aggregation=false

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • Impala 2.3.0
    • Impala 2.5.0
    • Backend

    Description

      After properly configuring out Jenkins job to run with the legacy join and agg (IMPALA-2337), we see crashes/failures on that job.

      Example run:
      http://sandbox.jenkins.cloudera.com/view/Impala/view/Builds%20-%202.3.0%20Release/job/impala-cdh5.5.x-non-partitioned-hash-and-aggs/13/

      We must investigate and fix all issues because we decided to continue to support running Impala with the legacy join/agg.

      I was able to repro a crash locally.
      1. Start impala with:

      ./start-impala-cluster.py --impalad_args="--enable_partitioned_aggregation=false --enable_partitioned_hash_join=false"
      

      2. Run the end-to-end test suite:

      ./tests/run-tests.py
      

      I got a core and dug in to see this query is crashing:

      select a.int_col, count(b.int_col) int_sum from functional.alltypesagg a
      join (select * from functional.alltypes
            where year=2009 and month=1 order by int_col limit 2500
            union all
            select * from functional.alltypes
            where year=2009 and month=2 limit 3000) b
      on (a.int_col = b.int_col)
      group by a.int_col
      order by int_sum;
      

      Here's the bt:

      (gdb) bt
      #0  0x00007f07a7156425 in raise () from /lib/x86_64-linux-gnu/libc.so.6
      #1  0x00007f07a7159b8b in abort () from /lib/x86_64-linux-gnu/libc.so.6
      #2  0x00007f07a8c5fac5 in os::abort(bool) () from /home/abehm/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so
      #3  0x00007f07a8dbf137 in VMError::report_and_die() () from /home/abehm/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so
      #4  0x00007f07a8c635e0 in JVM_handle_linux_signal () from /home/abehm/jdk1.7.0_25/jre/lib/amd64/server/libjvm.so
      #5  <signal handler called>
      #6  0x00000000012298bc in impala::TupleDescriptor::byte_size (this=0x0)
          at /home/abehm/impala/be/src/runtime/descriptors.h:337
      #7  0x00000000016cea98 in impala::AggregationNode::Close (this=0x137138000, state=0x19e86d00)
          at /home/abehm/impala/be/src/exec/aggregation-node.cc:287
      #8  0x00000000015cbe67 in impala::ExecNode::Close (this=0x19186c00, state=0x19e86d00)
          at /home/abehm/impala/be/src/exec/exec-node.cc:179
      #9  0x00000000016bf9f6 in impala::SortNode::Close (this=0x19186c00, state=0x19e86d00)
          at /home/abehm/impala/be/src/exec/sort-node.cc:134
      #10 0x0000000001595ce7 in impala::PlanFragmentExecutor::Close (this=0x19e86628)
          at /home/abehm/impala/be/src/runtime/plan-fragment-executor.cc:573
      #11 0x000000000158fe87 in impala::PlanFragmentExecutor::~PlanFragmentExecutor (this=0x19e86628, 
          __in_chrg=<optimized out>) at /home/abehm/impala/be/src/runtime/plan-fragment-executor.cc:72
      #12 0x000000000137e815 in impala::FragmentMgr::FragmentExecState::~FragmentExecState (this=0x19e86400, 
          __in_chrg=<optimized out>) at /home/abehm/impala/be/src/service/fragment-exec-state.h:42
      #13 0x000000000137f82a in boost::checked_delete<impala::FragmentMgr::FragmentExecState> (x=0x19e86400)
          at /usr/include/boost/checked_delete.hpp:34
      #14 0x0000000001381c2c in boost::detail::sp_counted_impl_p<impala::FragmentMgr::FragmentExecState>::dispose (
          this=0xb3414e0) at /usr/include/boost/smart_ptr/detail/sp_counted_impl.hpp:78
      #15 0x0000000000f80904 in boost::detail::sp_counted_base::release (this=0xb3414e0)
          at /usr/include/boost/smart_ptr/detail/sp_counted_base_gcc_x86.hpp:145
      #16 0x0000000000f8097d in boost::detail::shared_count::~shared_count (this=0x7f0727ac0fb8, __in_chrg=<optimized out>)
          at /usr/include/boost/smart_ptr/detail/shared_count.hpp:217
      #17 0x00000000012fa062 in boost::shared_ptr<impala::FragmentMgr::FragmentExecState>::~shared_ptr (
          this=0x7f0727ac0fb0, __in_chrg=<optimized out>) at /usr/include/boost/smart_ptr/shared_ptr.hpp:168
      #18 0x000000000137d9ee in impala::FragmentMgr::ExecPlanFragment (this=0xa327b60, exec_params=...)
          at /home/abehm/impala/be/src/service/fragment-mgr.cc:65
      #19 0x00000000012f9f91 in impala::ImpalaInternalService::ExecPlanFragment (this=0x8f93b30, return_val=..., params=...)
          at /home/abehm/impala/be/src/service/impala-internal-service.h:37
      #20 0x00000000014c4208 in impala::ImpalaInternalServiceProcessor::process_ExecPlanFragment (this=0xa327b00, seqid=0, 
          iprot=0x16f728c0, oprot=0x16f73580, callContext=0xffb5800)
          at /home/abehm/impala/be/generated-sources/gen-cpp/ImpalaInternalService.cpp:949
      #21 0x00000000014c3f3f in impala::ImpalaInternalServiceProcessor::dispatchCall (this=0xa327b00, iprot=0x16f728c0, 
          oprot=0x16f73580, fname=..., seqid=0, callContext=0xffb5800)
          at /home/abehm/impala/be/generated-sources/gen-cpp/ImpalaInternalService.cpp:922
      #22 0x00000000012f0a6c in apache::thrift::TDispatchProcessor::process (this=0xa327b00, in=..., out=..., 
          connectionContext=0xffb5800)
      

      Note that running that above query by itself works fine.

      Attachments

        Activity

          People

            mjacobs Matthew Jacobs
            alex.behm Alexander Behm
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: