Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4098

DCHECK in ExprContext::Clone() because the context has not been opened.

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: Impala 2.8.0
    • Fix Version/s: Impala 2.8.0
    • Component/s: Backend
    • Labels:

      Description

      Failed build:
      http://sandbox.jenkins.cloudera.com/view/Impala/view/Evergreen-asf-master/job/impala-asf-master-core/395/

      Stack:

      #0  0x0000003c1b4328e5 in raise () from /lib64/libc.so.6
      #1  0x0000003c1b4340c5 in abort () from /lib64/libc.so.6
      #2  0x0000000002819874 in google::DumpStackTraceAndExit() ()
      #3  0x0000000002812d0d in google::LogMessage::Fail() ()
      #4  0x0000000002815636 in google::LogMessage::SendToLog() ()
      #5  0x000000000281282d in google::LogMessage::Flush() ()
      #6  0x00000000028160de in google::LogMessageFatal::~LogMessageFatal() ()
      #7  0x000000000187845c in impala::ExprContext::Clone (this=0x19b3f600, state=0x18217200, new_ctx=0x13a45df0) at /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/exprs/expr-context.cc:99
      #8  0x000000000186dd9b in impala::Expr::CloneIfNotExists (ctxs=..., state=0x18217200, new_ctxs=0x7fd75ec68720) at /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/exprs/expr.cc:406
      #9  0x0000000001690542 in impala::HdfsScanNodeBase::Open (this=0xd445100, state=0x18217200) at /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/exec/hdfs-scan-node-base.cc:384
      #10 0x0000000001683898 in impala::HdfsScanNode::Open (this=0xd445100, state=0x18217200) at /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/exec/hdfs-scan-node.cc:200
      #11 0x00000000017d1414 in impala::BlockingJoinNode::ConstructBuildAndOpenProbe (this=0x1a2271000, state=0x18217200, build_sink=0xad85600) at /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/exec/blocking-join-node.cc:209
      #12 0x00000000016faf59 in impala::NestedLoopJoinNode::Open (this=0x1a2271000, state=0x18217200) at /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/exec/nested-loop-join-node.cc:83
      #13 0x000000000174a9db in impala::PartitionedAggregationNode::Open (this=0xf00ed00, state=0x18217200) at /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/exec/partitioned-aggregation-node.cc:302
      #14 0x00000000019ce8ba in impala::PlanFragmentExecutor::OpenInternal (this=0xa25fa80) at /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/runtime/plan-fragment-executor.cc:371
      #15 0x00000000019ce5f5 in impala::PlanFragmentExecutor::Open (this=0xa25fa80) at /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/runtime/plan-fragment-executor.cc:344
      #16 0x0000000001997c69 in impala::Coordinator::Wait (this=0xd2b2000) at /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/runtime/coordinator.cc:1094
      #17 0x00000000014fff69 in impala::ImpalaServer::QueryExecState::WaitInternal (this=0x187c6000) at /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/service/query-exec-state.cc:625
      #18 0x00000000014ffc3c in impala::ImpalaServer::QueryExecState::Wait (this=0x187c6000) at /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/service/query-exec-state.cc:601
      #19 0x0000000001519667 in boost::_mfi::mf0<void, impala::ImpalaServer::QueryExecState>::operator() (this=0x7fd75ec69c48, p=0x187c6000) at /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/boost-1.57.0/include/boost/bind/mem_fn_template.hpp:49
      #20 0x0000000001519176 in boost::_bi::list1<boost::_bi::value<impala::ImpalaServer::QueryExecState*> >::operator()<boost::_mfi::mf0<void, impala::ImpalaServer::QueryExecState>, boost::_bi::list0> (this=0x7fd75ec69c58, f=..., a=...) at /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/boost-1.57.0/include/boost/bind/bind.hpp:253
      #21 0x0000000001518e47 in boost::_bi::bind_t<void, boost::_mfi::mf0<void, impala::ImpalaServer::QueryExecState>, boost::_bi::list1<boost::_bi::value<impala::ImpalaServer::QueryExecState*> > >::operator() (this=0x7fd75ec69c48) at /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/boost-1.57.0/include/boost/bind/bind_template.hpp:20
      #22 0x0000000001518646 in boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, boost::_mfi::mf0<void, impala::ImpalaServer::QueryExecState>, boost::_bi::list1<boost::_bi::value<impala::ImpalaServer::QueryExecState*> > >, void>::invoke (function_obj_ptr=...) at /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/boost-1.57.0/include/boost/function/function_template.hpp:153
      #23 0x000000000132cef4 in boost::function0<void>::operator() (this=0x7fd75ec69c40) at /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/boost-1.57.0/include/boost/function/function_template.hpp:767
      #24 0x00000000015f245b in impala::Thread::SuperviseThread (name=..., category=..., functor=..., thread_started=0x7fd7ef894c30) at /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/be/src/util/thread.cc:318
      #25 0x00000000015f8e74 in boost::_bi::list4<boost::_bi::value<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<boost::function<void()> >, boost::_bi::value<impala::Promise<long int>*> >::operator()<void (*)(const std::basic_string<char>&, const std::basic_string<char>&, boost::function<void()>, impala::Promise<long int>*), boost::_bi::list0>(boost::_bi::type<void>, void (*&)(const std::basic_string<char, std::char_traits<char>, std::allocator<char> > &, const std::basic_string<char, std::char_traits<char>, std::allocator<char> > &, boost::function<void()>, impala::Promise<long> *), boost::_bi::list0 &, int) (this=0x953d7c0, f=@0x953d7b8, a=...) at /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/boost-1.57.0/include/boost/bind/bind.hpp:457
      #26 0x00000000015f8db7 in boost::_bi::bind_t<void, void (*)(const std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, const std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, boost::function<void()>, impala::Promise<long int>*), boost::_bi::list4<boost::_bi::value<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<boost::function<void()> >, boost::_bi::value<impala::Promise<long int>*> > >::operator()(void) (this=0x953d7b8) at /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/boost-1.57.0/include/boost/bind/bind_template.hpp:20
      #27 0x00000000015f8d12 in boost::detail::thread_data<boost::_bi::bind_t<void, void (*)(const std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, const std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, boost::function<void()>, impala::Promise<long int>*), boost::_bi::list4<boost::_bi::value<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<boost::function<void()> >, boost::_bi::value<impala::Promise<long int>*> > > >::run(void) (this=0x953d600) at /data/jenkins/workspace/impala-umbrella-build-and-test/Impala-Toolchain/boost-1.57.0/include/boost/thread/detail/thread.hpp:116
      #28 0x0000000001a4474a in thread_proxy ()
      #29 0x0000003c1b807851 in start_thread () from /lib64/libpthread.so.0
      #30 0x0000003c1b4e894d in clone () from /lib64/libc.so.6
      

      This is the DCHECK from the Impala logs:

      impalad.FATAL:F0908 08:37:55.798429 27780 expr-context.cc:99] Check failed: opened_
      

      Relevant info from Jenkins logs:

      03:23:07.137 query_test/test_scanners.py::TestScannersAllTableFormats::test_scanners[batch_size: 1 | exec_option: {'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: rc/gzip/block] 
      03:23:07.137 [gw3] PASSED query_test/test_scanners.py::TestScanRangeLengths::test_scan_ranges[max_scan_range_length: 16 | exec_option: {'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: seq/snap/block] 
      03:23:07.137 query_test/test_scanners.py::TestScanRangeLengths::test_scan_ranges[max_scan_range_length: 5 | exec_option: {'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: seq/snap/block] 
      03:23:07.137 [gw3] FAILED query_test/test_scanners.py::TestScanRangeLengths::test_scan_ranges[max_scan_range_length: 5 | exec_option: {'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: seq/snap/block] 
      03:23:07.137 query_test/test_scanners.py::TestScanRangeLengths::test_scan_ranges[max_scan_range_length: 2 | exec_option: {'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: seq/snap/block] 
      03:23:07.137 [gw2] FAILED query_test/test_scanners.py::TestScannersAllTableFormats::test_scanners[batch_size: 1 | exec_option: {'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: rc/gzip/block] 
      03:23:07.137 query_test/test_scanners.py::TestScannersAllTableFormats::test_scanners[batch_size: 16 | exec_option: {'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: text/none] 
      03:23:07.137 [gw1] FAILED query_test/test_scanners.py::TestParquet::test_parquet[exec_option: {'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: parquet/none] 
      03:23:07.137 query_test/test_scanners.py::TestParquet::test_corrupt_files[exec_option: {'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: parquet/none] 
      03:23:07.137 [gw3] FAILED query_test/test_scanners.py::TestScanRangeLengths::test_scan_ranges[max_scan_range_length: 2 | exec_option: {'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: seq/snap/block] 
      03:23:07.137 query_test/test_scanners.py::TestScanRangeLengths::test_scan_ranges[max_scan_range_length: 1 | exec_option: {'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: text/none] 
      03:23:07.137 [gw2] FAILED query_test/test_scanners.py::TestScannersAllTableFormats::test_scanners[batch_size: 16 | exec_option: {'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: text/none] 
      03:23:07.137 query_test/test_scanners.py::TestScannersAllTableFormats::test_scanners[batch_size: 16 | exec_option: {'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: seq/def/block] 
      03:23:07.137 [gw1] FAILED query_test/test_scanners.py::TestParquet::test_corrupt_files[exec_option: {'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: parquet/none] 
      03:23:07.137 query_test/test_scanners.py::TestParquet::test_corrupt_rle_counts[exec_option: {'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: parquet/none] 
      03:23:07.137 [gw0] FAILED query_test/test_join_queries.py::TestJoinQueries::test_single_node_nested_loop_joins[batch_size: 0 | exec_option: {'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: parquet/none] 
      03:23:07.137 query_test/test_join_queries.py::TestJoinQueries::test_single_node_nested_loop_joins_exhaustive[batch_size: 0 | exec_option: {'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: parquet/none] 
      03:23:07.137 [gw0] SKIPPED query_test/test_join_queries.py::TestJoinQueries::test_single_node_nested_loop_joins_exhaustive[batch_size: 0 | exec_option: {'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: parquet/none] 
      03:23:07.137 query_test/test_join_queries.py::TestJoinQueries::test_empty_build_joins[batch_size: 0 | exec_option: {'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: parquet/none] 
      03:23:07.137 [gw3] FAILED query_test/test_scanners.py::TestScanRangeLengths::test_scan_ranges[max_scan_range_length: 1 | exec_option: {'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: text/none] 
      03:23:07.137 query_test/test_scanners.py::TestScanRangeLengths::test_scan_ranges[max_scan_range_length: 0 | exec_option: {'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: seq/snap/block] 
      03:23:07.137 [gw0] FAILED query_test/test_join_queries.py::TestJoinQueries::test_empty_build_joins[batch_size: 0 | exec_option: {'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: parquet/none] 
      03:23:07.137 [gw1] ERROR query_test/test_scanners.py::TestParquet::test_corrupt_rle_counts[exec_option: {'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: parquet/none] INTERNALERROR> Traceback (most recent call last):
      

        Activity

        Hide
        alex.behm Alexander Behm added a comment -

        commit 218019e59fc6740e1564c91516b0dee46c64ed83
        Author: Alex Behm <alex.behm@cloudera.com>
        Date: Thu Sep 8 17:02:06 2016 -0700

        IMPALA-4098: Open()/Close() partition exprs once per fragment instance.

        Partition exprs stored in the descriptor table can be referenced by multiple
        exec nodes (and/or a data sink) within the same fragment instance, so the
        lifecycle of those exprs (Prepare/Open/Close) is tied to the fragment instance
        and not to a particular exec node.

        A recent change exposed this improper lifecycle management because we cloned
        the partition exprs before using them, but by that time the exprs had been
        closed which caused the cloning function to hit a DCHECK.

        The fix is to tie the lifecycle of those exprs to that of the fragment
        instance.

        Testing: I could reliably reproduce the bug by running this query in a loop:

        set num_nodes=1;
        select count(a.year), count(a.month), count(a.int_col),
        count(b.year), count(b.month), count(b.int_col)
        from functional.alltypessmall a, functional.alltypessmall b;

        After this patch I was not able to reproduce the bug anymore. I don't think
        it makes sense to add a test specifically for this bug because our existing
        tests already caught it, and the hit DCHECK does not exist anymore due to
        restructuring.

        Change-Id: Id179df645e500530f4418988f6ce64a03d669892
        Reviewed-on: http://gerrit.cloudera.org:8080/4340
        Reviewed-by: Alex Behm <alex.behm@cloudera.com>
        Tested-by: Internal Jenkins

        Show
        alex.behm Alexander Behm added a comment - commit 218019e59fc6740e1564c91516b0dee46c64ed83 Author: Alex Behm <alex.behm@cloudera.com> Date: Thu Sep 8 17:02:06 2016 -0700 IMPALA-4098 : Open()/Close() partition exprs once per fragment instance. Partition exprs stored in the descriptor table can be referenced by multiple exec nodes (and/or a data sink) within the same fragment instance, so the lifecycle of those exprs (Prepare/Open/Close) is tied to the fragment instance and not to a particular exec node. A recent change exposed this improper lifecycle management because we cloned the partition exprs before using them, but by that time the exprs had been closed which caused the cloning function to hit a DCHECK. The fix is to tie the lifecycle of those exprs to that of the fragment instance. Testing: I could reliably reproduce the bug by running this query in a loop: set num_nodes=1; select count(a.year), count(a.month), count(a.int_col), count(b.year), count(b.month), count(b.int_col) from functional.alltypessmall a, functional.alltypessmall b; After this patch I was not able to reproduce the bug anymore. I don't think it makes sense to add a test specifically for this bug because our existing tests already caught it, and the hit DCHECK does not exist anymore due to restructuring. Change-Id: Id179df645e500530f4418988f6ce64a03d669892 Reviewed-on: http://gerrit.cloudera.org:8080/4340 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins

          People

          • Assignee:
            alex.behm Alexander Behm
            Reporter:
            alex.behm Alexander Behm
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development