Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-15221

[C++] Occasional failure arrow-compute-hash-join-node-test

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • C++

    Description

      The test seems to be flaky. Full log

      44/84 Test #35: arrow-compute-hash-join-node-test .........***Failed    8.63 sec
      Running arrow-compute-hash-join-node-test, redirecting output into /build/cpp/build/test-logs/arrow-compute-hash-join-node-test.txt (attempt 1/1)
      /arrow/cpp/build-support/run-test.sh: line 88: 19125 Segmentation fault      (core dumped) $TEST_EXECUTABLE "$@" > $LOGFILE.raw 2>&1
      Running main() from /build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest_main.cc
      [==========] Running 23 tests from 2 test suites.
      [----------] Global test environment set-up.
      [----------] 7 tests from HashJoin
      [ RUN      ] HashJoin.Random
      /arrow/cpp/src/arrow/compute/exec/hash_join_node_test.cc:934: Failure
      Failed
      '_error_or_value46.status()' failed with Cancelled: Scheduler cancelled
      Google Test trace:
      /arrow/cpp/src/arrow/compute/exec/hash_join_node_test.cc:1053: FULL_OUTER IS parallel = false
      /build/cpp/src/arrow/compute/exec
      

      Another one observed in AMD64 Conda C++ Full Log

      [----------] 7 tests from HashJoin
      [ RUN      ] HashJoin.Random
      Found core dump, printing backtrace:warning: core file may not match specified executable file.
      [New LWP 19309]
      [New LWP 19308]
      [New LWP 19306]
      [New LWP 19310]
      [New LWP 19307]
      [New LWP 19311]
      [Thread debugging using libthread_db enabled]
      Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
      Core was generated by `/build/cpp/debug/arrow-compute-hash-join-node-test'.
      Program terminated with signal SIGSEGV, Segmentation fault.
      #0  0x0000000000011479 in ?? ()
      [Current thread is 1 (Thread 0x7f8cfcb7d700 (LWP 19309))]Thread 6 (Thread 0x7f8cf9fff700 (LWP 19311)):
      #0  0x00007f8d01131065 in futex_abstimed_wait_cancelable (private=<optimized out>, abstime=0x7f8cf9ffd4a0, expected=0, futex_word=0x7f8cff40a790) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
      #1  __pthread_cond_wait_common (abstime=0x7f8cf9ffd4a0, mutex=0x7f8cff40a7d8, cond=0x7f8cff40a768) at pthread_cond_wait.c:539
      #2  __pthread_cond_timedwait (cond=0x7f8cff40a768, mutex=0x7f8cff40a7d8, abstime=0x7f8cf9ffd4a0) at pthread_cond_wait.c:667
      #3  0x00007f8d04411496 in background_thread_sleep (tsdn=<optimized out>, interval=<optimized out>, info=0x7f8cff40a760) at src/background_thread.c:255
      #4  background_work_sleep_once (ind=<optimized out>, info=<optimized out>, tsdn=<optimized out>) at src/background_thread.c:307
      #5  background_work (ind=<optimized out>, tsd=<optimized out>) at src/background_thread.c:497
      #6  background_thread_entry () at src/background_thread.c:522
      #7  0x00007f8d0112a6db in start_thread (arg=0x7f8cf9fff700) at pthread_create.c:463
      #8  0x00007f8d0180171f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95Thread 5 (Thread 0x7f8cfedff700 (LWP 19307)):
      #0  0x00007f8d01131065 in futex_abstimed_wait_cancelable (private=<optimized out>, abstime=0x7f8cfedfd4a0, expected=0, futex_word=0x7f8cff40a5f0) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
      #1  __pthread_cond_wait_common (abstime=0x7f8cfedfd4a0, mutex=0x7f8cff40a638, cond=0x7f8cff40a5c8) at pthread_cond_wait.c:539
      #2  __pthread_cond_timedwait (cond=0x7f8cff40a5c8, mutex=0x7f8cff40a638, abstime=0x7f8cfedfd4a0) at pthread_cond_wait.c:667
      #3  0x00007f8d04411bc6 in background_thread_sleep (tsdn=<optimized out>, interval=<optimized out>, info=<optimized out>) at src/background_thread.c:255
      #4  background_work_sleep_once (ind=0, info=<optimized out>, tsdn=<optimized out>) at src/background_thread.c:307
      #5  background_thread0_work (tsd=<optimized out>) at src/background_thread.c:452
      #6  background_work (ind=<optimized out>, tsd=<optimized out>) at src/background_thread.c:490
      #7  background_thread_entry () at src/background_thread.c:522
      #8  0x00007f8d0112a6db in start_thread (arg=0x7f8cfedff700) at pthread_create.c:463
      #9  0x00007f8d0180171f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95Thread 4 (Thread 0x7f8cfb5ff700 (LWP 19310)):
      #0  0x00007f8d01131065 in futex_abstimed_wait_cancelable (private=<optimized out>, abstime=0x7f8cfb5fd4a0, expected=0, futex_word=0x7f8cff40a6c0) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
      #1  __pthread_cond_wait_common (abstime=0x7f8cfb5fd4a0, mutex=0x7f8cff40a708, cond=0x7f8cff40a698) at pthread_cond_wait.c:539
      #2  __pthread_cond_timedwait (cond=0x7f8cff40a698, mutex=0x7f8cff40a708, abstime=0x7f8cfb5fd4a0) at pthread_cond_wait.c:667
      #3  0x00007f8d04411496 in background_thread_sleep (tsdn=<optimized out>, interval=<optimized out>, info=0x7f8cff40a690) at src/background_thread.c:255
      #4  background_work_sleep_once (ind=<optimized out>, info=<optimized out>, tsdn=<optimized out>) at src/background_thread.c:307
      #5  background_work (ind=<optimized out>, tsd=<optimized out>) at src/background_thread.c:497
      #6  background_thread_entry () at src/background_thread.c:522
      #7  0x00007f8d0112a6db in start_thread (arg=0x7f8cfb5ff700) at pthread_create.c:463
      #8  0x00007f8d0180171f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95Thread 3 (Thread 0x7f8cff6db0c0 (LWP 19306)):
      #0  0x00005630dcb287d4 in __gnu_cxx::operator==<int const*, std::vector<int, std::allocator<int> > > (__lhs=..., __rhs=...) at /opt/conda/envs/arrow/x86_64-conda-linux-gnu/include/c++/9.4.0/bits/stl_iterator.h:890
      #1  0x00005630dcb1d3d1 in std::vector<int, std::allocator<int> >::empty (this=0x7ffc7ab72ae0) at /opt/conda/envs/arrow/x86_64-conda-linux-gnu/include/c++/9.4.0/bits/stl_vector.h:1005
      #2  0x00005630dcafd4ea in arrow::compute::HashJoinSimpleInt (join_type=arrow::compute::JoinType::FULL_OUTER, l=..., null_in_key_l=..., r=..., null_in_key_r=..., result_l=0x7ffc7ab72cb0, result_r=0x7ffc7ab72cd0, output_length_limit=100000, length_limit_reached=0x7ffc7ab72e77) at /arrow/cpp/src/arrow/compute/exec/hash_join_node_test.cc:781
      #3  0x00005630dcafe22c in arrow::compute::HashJoinSimple (ctx=0x5630de57a320, join_type=arrow::compute::JoinType::FULL_OUTER, cmp=..., num_key_fields=1, key_id_l=..., key_id_r=..., original_l=..., original_r=..., l=..., r=..., output_ids_l=..., output_ids_r=..., output_length_limit=100000, length_limit_reached=0x7ffc7ab72e77) at /arrow/cpp/src/arrow/compute/exec/hash_join_node_test.cc:887
      #4  0x00005630dcb011c0 in arrow::compute::HashJoin_Random_Test::TestBody (this=0x5630de47a300) at /arrow/cpp/src/arrow/compute/exec/hash_join_node_test.cc:1067
      #5  0x00007f8d056a3c9c in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void> (object=0x5630de47a300, method=&virtual testing::Test::TestBody(), location=0x7f8d056b897b "the test body") at /build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:2607
      #6  0x00007f8d0569add2 in testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void> (object=0x5630de47a300, method=&virtual testing::Test::TestBody(), location=0x7f8d056b897b "the test body") at /build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:2643
      #7  0x00007f8d05675c03 in testing::Test::Run (this=0x5630de47a300) at /build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:2682
      #8  0x00007f8d0567663b in testing::TestInfo::Run (this=0x5630de476b50) at /build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:2861
      #9  0x00007f8d05677010 in testing::TestSuite::Run (this=0x5630de476c70) at /build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:3015
      #10 0x00007f8d0568731c in testing::internal::UnitTestImpl::RunAllTests (this=0x5630de4762e0) at /build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:5855
      #11 0x00007f8d056a4ce8 in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (object=0x5630de4762e0, method=(bool (testing::internal::UnitTestImpl::*)(testing::internal::UnitTestImpl * const)) 0x7f8d05686ed8 <testing::internal::UnitTestImpl::RunAllTests()>, location=0x7f8d056b9468 "auxiliary test code (environments or event listeners)") at /build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:2607
      #12 0x00007f8d0569c064 in testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (object=0x5630de4762e0, method=(bool (testing::internal::UnitTestImpl::*)(testing::internal::UnitTestImpl * const)) 0x7f8d05686ed8 <testing::internal::UnitTestImpl::RunAllTests()>, location=0x7f8d056b9468 "auxiliary test code (environments or event listeners)") at /build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:2643
      #13 0x00007f8d056857b7 in testing::UnitTest::Run (this=0x7f8d056e5260 <testing::UnitTest::GetInstance()::instance>) at /build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:5438
      #14 0x00007f8d056e6919 in RUN_ALL_TESTS () at /build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/include/gtest/gtest.h:2490
      #15 0x00007f8d056e695c in main (argc=1, argv=0x7ffc7ab739d8) at /build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest_main.cc:52
      #16 0x00007f8d01701bf7 in __libc_start_main (main=0x7f8d056e691b <main(int, char**)>, argc=1, argv=0x7ffc7ab739d8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffc7ab739c8) at ../csu/libc-start.c:310
      #17 0x00005630dcaedf49 in _start ()Thread 2 (Thread 0x7f8cfdb7e700 (LWP 19308)):
      #0  0x00007f8d01130ad3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x5630de565a80) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
      #1  __pthread_cond_wait_common (abstime=0x0, mutex=0x5630de565a30, cond=0x5630de565a58) at pthread_cond_wait.c:502
      #2  __pthread_cond_wait (cond=0x5630de565a58, mutex=0x5630de565a30) at pthread_cond_wait.c:655
      #3  0x00007f8d01b994d1 in __gthread_cond_wait (__mutex=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, __cond=<optimized out>) at /home/conda/feedstock_root/build_artifacts/gcc_compilers_1643063175398/work/build/x86_64-conda-linux-gnu/libstdc++-v3/src/c++11/condition_variable.cc:865
      #4  std::__condvar::wait (__m=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, this=<optimized out>) at ../../../../../libstdc++-v3/src/c++11/gthr-default.h:155
      #5  std::condition_variable::wait (this=<optimized out>, __lock=...) at ../../../../../libstdc++-v3/src/c++11/condition_variable.cc:41
      #6  0x00007f8d02f8fbb7 in arrow::internal::WorkerLoop (state=..., it=...) at /arrow/cpp/src/arrow/util/thread_pool.cc:195
      #7  0x00007f8d02f90960 in arrow::internal::ThreadPool::<lambda()>::operator()(void) const (__closure=0x5630de561958) at /arrow/cpp/src/arrow/util/thread_pool.cc:344
      #8  0x00007f8d02f97498 in std::__invoke_impl<void, arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::<lambda()> >(std::__invoke_other, arrow::internal::ThreadPool::<lambda()> &&) (__f=...) at /opt/conda/envs/arrow/x86_64-conda-linux-gnu/include/c++/9.4.0/bits/invoke.h:60
      #9  0x00007f8d02f97438 in std::__invoke<arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::<lambda()> >(arrow::internal::ThreadPool::<lambda()> &&) (__fn=...) at /opt/conda/envs/arrow/x86_64-conda-linux-gnu/include/c++/9.4.0/bits/invoke.h:95
      #10 0x00007f8d02f973d6 in std::thread::_Invoker<std::tuple<arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::<lambda()> > >::_M_invoke<0>(std::_Index_tuple<0>) (this=0x5630de561958) at /opt/conda/envs/arrow/x86_64-conda-linux-gnu/include/c++/9.4.0/thread:244
      #11 0x00007f8d02f97293 in std::thread::_Invoker<std::tuple<arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::<lambda()> > >::operator()(void) (this=0x5630de561958) at /opt/conda/envs/arrow/x86_64-conda-linux-gnu/include/c++/9.4.0/thread:251
      #12 0x00007f8d02f971e4 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::<lambda()> > > >::_M_run(void) (this=0x5630de561950) at /opt/conda/envs/arrow/x86_64-conda-linux-gnu/include/c++/9.4.0/thread:195
      #13 0x00007f8d01b9d9d4 in std::execute_native_thread_routine (__p=<optimized out>) at /home/conda/feedstock_root/build_artifacts/gcc_compilers_1643063175398/work/build/x86_64-conda-linux-gnu/libstdc++-v3/include/bits/new_allocator.h:82
      #14 0x00007f8d0112a6db in start_thread (arg=0x7f8cfdb7e700) at pthread_create.c:463
      #15 0x00007f8d0180171f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95Thread 1 (Thread 0x7f8cfcb7d700 (LWP 19309)):
      #0  0x0000000000011479 in ?? ()
      #1  0x00007f8d0331fae3 in arrow::compute::TaskSchedulerImpl::ScheduleMore (this=0x5630de572960, thread_id=0, num_tasks_finished=0) at /arrow/cpp/src/arrow/compute/exec/task_util.cc:326
      #2  0x00007f8d0331e94c in arrow::compute::TaskSchedulerImpl::StartTaskGroup (this=0x5630de572960, thread_id=0, group_id=1, total_num_tasks=0) at /arrow/cpp/src/arrow/compute/exec/task_util.cc:153
      #3  0x00007f8d0327d952 in arrow::compute::HashJoinBasicImpl::ProbeQueuedBatches (this=0x7f8cec24aee0, thread_index=0) at /arrow/cpp/src/arrow/compute/exec/hash_join.cc:726
      #4  0x00007f8d0327d13b in arrow::compute::HashJoinBasicImpl::BuildHashTable_on_finished (this=0x7f8cec24aee0, thread_index=0) at /arrow/cpp/src/arrow/compute/exec/hash_join.cc:663
      #5  0x00007f8d0327d2db in arrow::compute::HashJoinBasicImpl::RegisterBuildHashTable()::{lambda(unsigned long)#2}::operator()(unsigned long) const (__closure=0x5630de654840, thread_index=0) at /arrow/cpp/src/arrow/compute/exec/hash_join.cc:674
      #6  0x00007f8d0328213c in std::_Function_handler<arrow::Status (unsigned long), arrow::compute::HashJoinBasicImpl::RegisterBuildHashTable()::{lambda(unsigned long)#2}>::_M_invoke(std::_Any_data const&, unsigned long&&) (__functor=..., __args#0=@0x7f8cfcb7b138: 0) at /opt/conda/envs/arrow/x86_64-conda-linux-gnu/include/c++/9.4.0/bits/std_function.h:286
      #7  0x00007f8d032aa81e in std::function<arrow::Status (unsigned long)>::operator()(unsigned long) const (this=0x5630de654840, __args#0=0) at /opt/conda/envs/arrow/x86_64-conda-linux-gnu/include/c++/9.4.0/bits/std_function.h:688
      #8  0x00007f8d0331f041 in arrow::compute::TaskSchedulerImpl::OnTaskGroupFinished (this=0x5630de572960, thread_id=0, group_id=0, all_task_groups_finished=0x7f8cfcb7b230) at /arrow/cpp/src/arrow/compute/exec/task_util.cc:244
      #9  0x00007f8d0331f934 in arrow::compute::TaskSchedulerImpl::<lambda(size_t)>::operator()(size_t) const (__closure=0x5630de6a1390, thread_id=0) at /arrow/cpp/src/arrow/compute/exec/task_util.cc:349
      #10 0x00007f8d0332152f in std::_Function_handler<arrow::Status(long unsigned int), arrow::compute::TaskSchedulerImpl::ScheduleMore(size_t, int)::<lambda(size_t)> >::_M_invoke(const std::_Any_data &, unsigned long &&) (__functor=..., __args#0=@0x7f8cfcb7b2b8: 0) at /opt/conda/envs/arrow/x86_64-conda-linux-gnu/include/c++/9.4.0/bits/std_function.h:286
      #11 0x00007f8d032aa81e in std::function<arrow::Status (unsigned long)>::operator()(unsigned long) const (this=0x5630de654f70, __args#0=0) at /opt/conda/envs/arrow/x86_64-conda-linux-gnu/include/c++/9.4.0/bits/std_function.h:688
      #12 0x00007f8d032a7f8c in arrow::compute::HashJoinNode::ScheduleTaskCallback(std::function<arrow::Status (unsigned long)>)::{lambda()#1}::operator()() const (__closure=0x5630de654f68) at /arrow/cpp/src/arrow/compute/exec/hash_join_node.cc:604
      #13 0x00007f8d032b9329 in arrow::internal::FnOnce<void ()>::FnImpl<arrow::compute::HashJoinNode::ScheduleTaskCallback(std::function<arrow::Status (unsigned long)>)::{lambda()#1}>::invoke() (this=0x5630de654f60) at /arrow/cpp/src/arrow/util/functional.h:152
      #14 0x00007f8d02f91ade in arrow::internal::FnOnce<void ()>::operator()() && (this=0x7f8cfcb7b3f0) at /arrow/cpp/src/arrow/util/functional.h:140
      #15 0x00007f8d02f8fa87 in arrow::internal::WorkerLoop (state=..., it=...) at /arrow/cpp/src/arrow/util/thread_pool.cc:177
      #16 0x00007f8d02f90960 in arrow::internal::ThreadPool::<lambda()>::operator()(void) const (__closure=0x5630de659468) at /arrow/cpp/src/arrow/util/thread_pool.cc:344
      #17 0x00007f8d02f97498 in std::__invoke_impl<void, arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::<lambda()> >(std::__invoke_other, arrow::internal::ThreadPool::<lambda()> &&) (__f=...) at /opt/conda/envs/arrow/x86_64-conda-linux-gnu/include/c++/9.4.0/bits/invoke.h:60
      #18 0x00007f8d02f97438 in std::__invoke<arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::<lambda()> >(arrow::internal::ThreadPool::<lambda()> &&) (__fn=...) at /opt/conda/envs/arrow/x86_64-conda-linux-gnu/include/c++/9.4.0/bits/invoke.h:95
      #19 0x00007f8d02f973d6 in std::thread::_Invoker<std::tuple<arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::<lambda()> > >::_M_invoke<0>(std::_Index_tuple<0>) (this=0x5630de659468) at /opt/conda/envs/arrow/x86_64-conda-linux-gnu/include/c++/9.4.0/thread:244
      #20 0x00007f8d02f97293 in std::thread::_Invoker<std::tuple<arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::<lambda()> > >::operator()(void) (this=0x5630de659468) at /opt/conda/envs/arrow/x86_64-conda-linux-gnu/include/c++/9.4.0/thread:251
      #21 0x00007f8d02f971e4 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::<lambda()> > > >::_M_run(void) (this=0x5630de659460) at /opt/conda/envs/arrow/x86_64-conda-linux-gnu/include/c++/9.4.0/thread:195
      #22 0x00007f8d01b9d9d4 in std::execute_native_thread_routine (__p=<optimized out>) at /home/conda/feedstock_root/build_artifacts/gcc_compilers_1643063175398/work/build/x86_64-conda-linux-gnu/libstdc++-v3/include/bits/new_allocator.h:82
      #23 0x00007f8d0112a6db in start_thread (arg=0x7f8cfcb7d700) at pthread_create.c:463
      #24 0x00007f8d0180171f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
      /build/cpp/src/arrow/compute/exec 

      Attachments

        1. log.txt
          282 kB
          David Li
        2. hash-join-node-test-failure.log
          399 kB
          Weston Pace

        Issue Links

          Activity

            People

              Unassigned Unassigned
              lidavidm David Li
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: