Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-7906

Crash in JVM PSPromotionManager::copy_to_survivor_space

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Cannot Reproduce
    • Impala 3.2.0
    • None
    • Backend

    Description

      #0  0x00007f44ca5261f7 in raise () from /lib64/libc.so.6
      #1  0x00007f44ca5278e8 in abort () from /lib64/libc.so.6
      #2  0x00007f44cd726185 in os::abort(bool) () from /usr/java/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so
      #3  0x00007f44cd8c8593 in VMError::report_and_die() () from /usr/java/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so
      #4  0x00007f44cd8c8a7e in crash_handler(int, siginfo*, void*) () from /usr/java/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so
      #5  0x00007f44cd724f72 in os::Linux::chained_handler(int, siginfo*, void*) () from /usr/java/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so
      #6  0x00007f44cd72b5f6 in JVM_handle_linux_signal () from /usr/java/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so
      #7  0x00007f44cd721be3 in signalHandler(int, siginfo*, void*) () from /usr/java/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so
      #8  <signal handler called>
      #9  0x00007f44cd713e95 in oopDesc::print_on(outputStream*) const () from /usr/java/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so
      #10 0x00007f44cd72afdb in os::print_register_info(outputStream*, void*) () from /usr/java/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so
      #11 0x00007f44cd8c6c13 in VMError::report(outputStream*) () from /usr/java/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so
      #12 0x00007f44cd8c818a in VMError::report_and_die() () from /usr/java/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so
      #13 0x00007f44cd72b68f in JVM_handle_linux_signal () from /usr/java/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so
      #14 0x00007f44cd721be3 in signalHandler(int, siginfo*, void*) () from /usr/java/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so
      #15 <signal handler called>
      #16 0x00007f44cd78f562 in oopDesc* PSPromotionManager::copy_to_survivor_space<false>(oopDesc*) () from /usr/java/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so
      #17 0x00007f44cd7924a5 in PSRootsClosure<false>::do_oop(oopDesc**) () from /usr/java/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so
      #18 0x00007f44cd716a96 in InterpreterOopMap::iterate_oop(OffsetClosure*) const () from /usr/java/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so
      #19 0x00007f44cd38f789 in frame::oops_interpreted_do(OopClosure*, CLDClosure*, RegisterMap const*, bool) () from /usr/java/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so
      #20 0x00007f44cd86eaa1 in JavaThread::oops_do(OopClosure*, CLDClosure*, CodeBlobClosure*) () from /usr/java/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so
      #21 0x00007f44cd79270f in ThreadRootsTask::do_it(GCTaskManager*, unsigned int) () from /usr/java/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so
      #22 0x00007f44cd3d7ecf in GCTaskThread::run() () from /usr/java/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so
      #23 0x00007f44cd727338 in java_start(Thread*) () from /usr/java/jdk1.8.0_144/jre/lib/amd64/server/libjvm.so
      #24 0x00007f44ca8bbe25 in start_thread () from /lib64/libpthread.so.0
      #25 0x00007f44ca5e934d in clone () from /lib64/libc.so.6
      

      These are the tests running at the time

      006:53:04 [gw1] PASSED query_test/test_mem_usage_scaling.py::TestQueryMemLimitScaling::test_mem_usage_scaling[mem_limit: -1 | protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': 0} | table_format: parquet/none] 
      06:53:07 query_test/test_mem_usage_scaling.py::TestQueryMemLimitScaling::test_mem_usage_scaling[mem_limit: 400m | protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': 0} | table_format: parquet/none] 
      06:53:07 [gw5] PASSED query_test/test_analytic_tpcds.py::TestAnalyticTpcds::test_analytic_functions_tpcds[batch_size: 1 | protocol: beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': 0} | table_format: parquet/none] 
      06:53:08 query_test/test_cancellation.py::TestCancellationParallel::test_cancel_select[protocol: beeswax | table_format: text/gzip/block | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': 0} | query_type: SELECT | wait_action: 0:GETNEXT:WAIT | cancel_delay: 0.01 | cpu_limit_s: 100000 | query: select * from lineitem limit 50 | fail_rpc_action: COORD_CANCEL_QUERY_FINSTANCES_RPC:FAIL | join_before_close: True | buffer_pool_limit: 0] 
      06:53:08 [gw5] PASSED query_test/test_cancellation.py::TestCancellationParallel::test_cancel_select[protocol: beeswax | table_format: text/gzip/block | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': 0} | query_type: SELECT | wait_action: 0:GETNEXT:WAIT | cancel_delay: 0.01 | cpu_limit_s: 100000 | query: select * from lineitem limit 50 | fail_rpc_action: COORD_CANCEL_QUERY_FINSTANCES_RPC:FAIL | join_before_close: True | buffer_pool_limit: 0] 
      06:53:08 [gw2] PASSED query_test/test_decimal_casting.py::TestDecimalCasting::test_min_max_zero_null[cast_from: number | decimal_type: (31, 14) | exec_option: {'decimal_v2': 'true'}] 
      06:53:09 query_test/test_decimal_casting.py::TestDecimalCasting::test_min_max_zero_null[cast_from: number | decimal_type: (31, 22) | exec_option: {'decimal_v2': 'true'}] 
      06:54:07 query_test/test_cancellation.py::TestCancellationParallel::test_cancel_select[protocol: beeswax | table_format: kudu/none | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': 0} | query_type: SELECT | wait_action: 0:GETNEXT:WAIT | cancel_delay: 0 | cpu_limit_s: 100000 | query: compute stats lineitem | fail_rpc_action: COORD_CANCEL_QUERY_FINSTANCES_RPC:FAIL | join_before_close: True | buffer_pool_limit: 0] 
      06:54:08 [gw6] FAILED query_test/test_decimal_fuzz.py::TestDecimalFuzz::test_decimal_ops[exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 5000, 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': 0}] 
      06:54:08 query_test/test_decimal_fuzz.py::TestDecimalFuzz::test_width_bucket[exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 5000, 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': 0}] 
      06:54:08 [gw6] FAILED query_test/test_decimal_fuzz.py::TestDecimalFuzz::test_width_bucket[exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 5000, 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': 0}] 
      06:54:08 query_test/test_decimal_queries.py::TestDecimalQueries::test_queries[protocol: beeswax | exec_option: {'disable_codegen_rows_threshold': 0, 'disable_codegen': 'false', 'decimal_v2': 'false', 'batch_size': 0} | table_format: text/none] 
      06:54:08 [gw6] ERROR query_test/test_decimal_queries.py::TestDecimalQueries::test_queries[protocol: beeswax | exec_option: {'disable_codegen_rows_threshold': 0, 'disable_codegen': 'false', 'decimal_v2': 'false', 'batch_size': 0} | table_format: text/none] 
      06:54:08 query_test/test_decimal_queries.py::TestDecimalQueries::test_queries[protocol: hs2 | exec_option: {'disable_codegen_rows_threshold': 0, 'disable_codegen': 'true', 'decimal_v2': 'false', 'batch_size': 0} | table_format: parquet/none] 
      

      One thing that's a little interesting is that it's running select repeat('AZ', 128 * 1024 * 1024), which passes a large string from the backend to frontend - maybe something went wrong there?

      Attachments

        1. hs_err_pid6290.log
          250 kB
          Tim Armstrong

        Issue Links

          Activity

            People

              tarmstrong Tim Armstrong
              tarmstrong Tim Armstrong
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: