Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-7422

Crash in QueryState::PublishFilter() fragment_map_.count(params.dst_fragment_idx) == 1 (0 vs. 1)

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      Ran into this running core tests on one of my patches.

       query-state.cc:506] Check failed: fragment_map_.count(params.dst_fragment_idx) == 1 (0 vs. 1) 
      *** Check failure stack trace: ***
          @          0x4387b8c
          @          0x4389431
          @          0x4387566
          @          0x438ab2d
          @          0x1e0ba94
          @          0x1f3d097
          @          0x303fe61
          @          0x303ddef
          @          0x18fa28f
          @          0x1d0d1b8
          @          0x1d054b8
          @          0x1d06bde
          @          0x1d06a74
          @          0x1d067c0
          @          0x1d066d3
          @          0x1c2d3c1
          @          0x2041992
          @          0x2049a6a
          @          0x204998e
          @          0x2049951
          @          0x32b31d9
          @     0x7fb7d61a2e24
          @     0x7fb7d5ed034c
      
      20:10:21 [gw0] PASSED query_test/test_runtime_filters.py::TestMinMaxFilters::test_min_max_filters[exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': 0} | table_format: kudu/none] 
      20:10:29 query_test/test_runtime_filters.py::TestMinMaxFilters::test_large_strings 
      20:10:29 [gw0] PASSED query_test/test_runtime_filters.py::TestMinMaxFilters::test_large_strings 
      20:10:30 query_test/test_runtime_filters.py::TestRuntimeRowFilters::test_row_filters[exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': 0} | table_format: parquet/none] 
      20:10:30 [gw3] PASSED query_test/test_runtime_filters.py::TestBloomFilters::test_bloom_filters[exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': 0} | table_format: seq/def/record] 
      20:10:32 query_test/test_runtime_filters.py::TestBloomFilters::test_bloom_filters[exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': 0} | table_format: seq/def/record] 
      20:10:32 [gw4] PASSED query_test/test_queries.py::TestQueries::test_subquery[exec_option: {'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': '100', 'batch_size': 0, 'num_nodes': 0} | table_format: rc/snap/block] 
      20:10:39 query_test/test_queries.py::TestQueries::test_subquery[exec_option: {'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': '100', 'batch_size': 0, 'num_nodes': 0} | table_format: rc/snap/block] 
      20:10:39 [gw5] FAILED query_test/test_queries.py::TestQueries::test_analytic_fns[exec_option: {'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': '100', 'batch_size': 0, 'num_nodes': 0} | table_format: text/bzip/block] 
      20:10:39 query_test/test_queries.py::TestQueries::test_analytic_fns[exec_option: {'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': '100', 'batch_size': 0, 'num_nodes': 0} | table_format: text/bzip/block] 
      20:10:39 [gw4] FAILED query_test/test_queries.py::TestQueries::test_subquery[exec_option: {'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': '100', 'batch_size': 0, 'num_nodes': 0} | table_format: rc/snap/block] 
      20:10:39 query_test/test_queries.py::TestQueries::test_subquery[exec_option: {'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': '100', 'batch_size': 0, 'num_nodes': 0} | table_format: avro/none] 
      20:10:39 [gw7] FAILED query_test/test_queries.py::TestQueries::test_sort[exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': 0} | table_format: parquet/none] 
      20:10:39 query_test/test_scanners.py::TestScannersAllTableFormats::test_scanners[batch_size: 16 | debug_action: None | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': 0} | table_format: rc/snap/block] 
      20:10:39 [gw4] FAILED query_test/test_queries.py::TestQueries::test_subquery[exec_option: {'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': '100', 'batch_size': 0, 'num_nodes': 0} | table_format: avro/none] 
      20:10:39 query_test/test_queries.py::TestQueries::test_subquery[exec_option: {'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': '100', 'batch_size': 0, 'num_nodes': 0} | table_format: avro/none] 
      20:10:39 [gw5] FAILED query_test/test_queries.py::TestQueries::test_analytic_fns[exec_option: {'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': '100', 'batch_size': 0, 'num_nodes': 0} | table_format: text/bzip/block] 
      20:10:39 query_test/test_queries.py::TestQueries::test_analytic_fns[exec_option: {'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': '100', 'batch_size': 0, 'num_nodes': 0} | table_format: text/snap/block] 
      20:10:39 [gw4] FAILED query_test/test_queries.py::TestQueries::test_subquery[exec_option: {'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': '100', 'batch_size': 0, 'num_nodes': 0} | table_format: avro/none] 
      20:10:39 query_test/test_queries.py::TestQueries::test_subquery[exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': 0} | table_format: avro/none] 
      20:10:39 [gw7] FAILED query_test/test_scanners.py::TestScannersAllTableFormats::test_scanners[batch_size: 16 | debug_action: None | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': 0} | table_format: rc/snap/block] 
      20:10:40 query_test/test_scanners.py::TestScannersAllTableFormats::test_scanners[batch_size: 1 | debug_action: -1:OPEN:SET_DENY_RESERVATION_PROBABILITY@1.0 | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': 0} | table_format: avro/def/block] 
      20:10:40 [gw5] FAILED query_test/test_queries.py::TestQueries::test_analytic_fns[exec_option: {'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': '100', 'batch_size': 0, 'num_nodes': 0} | table_format: text/snap/block] 
      20:10:40 query_test/test_queries.py::TestQueries::test_analytic_fns[exec_option: {'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': '100', 'batch_size': 0, 'num_nodes': 0} | table_format: text/snap/block] 
      20:10:45 [gw4] FAILED query_test/test_queries.py::TestQueries::test_subquery[exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': 0} | table_format: avro/none] Traceback (most recent call last):
      

      I looked at the code and I couldn't see how there was any synchronization between 'fragment_map_' being modified in QueryState::StartFInstances() and it being read in PublishFilters(). It looks like before IMPALA-7163, instances_prepared_promise_ functioned as a barrier between those two functions, so there was synchronization but it wasn't documented.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            kwho Michael Ho
            tarmstrong Tim Armstrong
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment