Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-6109

Hbase in minicluster appears to be flaky

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: Impala 2.11.0
    • Fix Version/s: Impala 2.11.0
    • Component/s: Infrastructure
    • Labels:

      Description

      I saw a bunch of hbase-related tests failing with errors along the lines below:
      metadata.test_compute_stats.TestHbaseComputeStats.test_hbase_compute_stats[exec_option:

      {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 5000, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0}

      | table_format: hbase/none]
      metadata.test_compute_stats.TestHbaseComputeStats.test_hbase_compute_stats_incremental[exec_option:

      {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 5000, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0}

      | table_format: hbase/none]
      query_test.test_hbase_queries.TestHBaseQueries.test_hbase_scan_node[exec_option:

      {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0}

      | table_format: hbase/none]
      query_test.test_join_queries.TestJoinQueries.test_joins_against_hbase[batch_size: 0 | exec_option:

      {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0}

      | table_format: parquet/none]
      query_test.test_hbase_queries.TestHBaseQueries.test_hbase_row_key[exec_option:

      {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0}

      | table_format: hbase/none]
      query_test.test_observability.TestObservability.test_scan_summary
      query_test.test_hbase_queries.TestHBaseQueries.test_hbase_filters[exec_option:

      {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0}

      | table_format: hbase/none]
      query_test.test_scanners.TestScannersAllTableFormats.test_scanners[batch_size: 0 | exec_option:

      {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0}

      | table_format: hbase/none]
      query_test.test_mt_dop.TestMtDop.test_mt_dop[mt_dop: 2 | exec_option:

      {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0}

      | table_format: hbase/none]
      query_test.test_hbase_queries.TestHBaseQueries.test_hbase_subquery[exec_option:

      {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0}

      | table_format: hbase/none]
      failure.test_failpoints.TestFailpoints.test_failpoints[table_format: hbase/none | exec_option:

      {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0}

      | mt_dop: 4 | location: CLOSE | action: FAIL | query: select 1 from alltypessmall order by id]
      failure.test_failpoints.TestFailpoints.test_failpoints[table_format: hbase/none | exec_option:

      {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0}

      | mt_dop: 0 | location: OPEN | action: CANCEL | query: select row_number() over (partition by int_col order by id) from alltypessmall]
      failure.test_failpoints.TestFailpoints.test_failpoints[table_format: hbase/none | exec_option:

      {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0}

      | mt_dop: 4 | location: GETNEXT_SCANNER | action: MEM_LIMIT_EXCEEDED | query: select * from alltypes]
      failure.test_failpoints.TestFailpoints.test_failpoints[table_format: hbase/none | exec_option:

      {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0}

      | mt_dop: 0 | location: GETNEXT | action: MEM_LIMIT_EXCEEDED | query: select 1 from alltypessmall order by id limit 100]
      failure.test_failpoints.TestFailpoints.test_failpoints[table_format: hbase/none | exec_option:

      {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0}

      | mt_dop: 0 | location: PREPARE_SCANNER | action: MEM_LIMIT_EXCEEDED | query: select count(int_col) from alltypessmall group by id]
      failure.test_failpoints.TestFailpoints.test_failpoints[table_format: hbase/none | exec_option:

      {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0}

      | mt_dop: 0 | location: OPEN | action: MEM_LIMIT_EXCEEDED | query: select count from alltypessmall]
      failure.test_failpoints.TestFailpoints.test_failpoints[table_format: hbase/none | exec_option:

      {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0}

      | mt_dop: 0 | location: CLOSE | action: MEM_LIMIT_EXCEEDED | query: select c from (select id c from alltypessmall order by id limit 10) v where c = 1]
      failure.test_failpoints.TestFailpoints.test_failpoints[table_format: hbase/none | exec_option:

      {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0}

      | mt_dop: 0 | location: CLOSE | action: MEM_LIMIT_EXCEEDED | query: select * from alltypessmall union all select * from alltypessmall]
      failure.test_failpoints.TestFailpoints.test_failpoints[table_format: hbase/none | exec_option:

      {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0}

      | mt_dop: 0 | location: CLOSE | action: MEM_LIMIT_EXCEEDED | query: select 1 from alltypessmall a join alltypessmall b on a.id != b.id]
      failure.test_failpoints.TestFailpoints.test_failpoints[table_format: hbase/none | exec_option:

      {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0}

      | mt_dop: 4 | location: PREPARE | action: FAIL | query: select 1 from alltypessmall a join alltypessmall b on a.id = b.id]

      E    Query aborted:RuntimeException: couldn't retrieve HBase table (functional_hbase.alltypessmall) info:
      E   This server is in the failed servers list: localhost/127.0.0.1:16202
      E   CAUSED BY: FailedServerException: This server is in the failed servers list: localhost/127.0.0.1:16202
      
      E   ImpalaBeeswaxException: ImpalaBeeswaxException:
      E    INNER EXCEPTION: <class 'beeswaxd.ttypes.BeeswaxException'>
      E    MESSAGE: RuntimeException: couldn't retrieve HBase table (functional_hbase.alltypessmall) info:
      E   Connection refused
      E   CAUSED BY: ConnectException: Connection refused
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                lv Lars Volker
                Reporter:
                tarmstrong Tim Armstrong
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: