Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5427

beeswax get_state() can return EXCEPTION before error is visible via get_log()

    Details

    • Epic Color:
      ghx-label-5

      Description

      We've seen this error once:

      09:52:03 FAIL query_test/test_scanners.py::TestParquet::()::test_corrupt_files[exec_option: {'disable_codegen': True, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: parquet/none]
      09:52:03 =================================== FAILURES ===================================
      09:52:03  TestParquet.test_corrupt_files[exec_option: {'disable_codegen': True, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: parquet/none] 
      09:52:03 [gw2] linux2 -- Python 2.6.6 /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/bin/../infra/python/env/bin/python
      09:52:03 query_test/test_scanners.py:254: in test_corrupt_files
      09:52:03     self.run_test_case('QueryTest/parquet-abort-on-error', vector)
      09:52:03 common/impala_test_suite.py:374: in run_test_case
      09:52:03     self.__verify_exceptions(test_section['CATCH'], str(e), use_db)
      09:52:03 common/impala_test_suite.py:253: in __verify_exceptions
      09:52:03     (expected_str, actual_str)
      09:52:03 E   AssertionError: Unexpected exception string. Expected: Column metadata states there are 50 values, but read 100 values from column element.
      09:52:03 E   Not found in actual: ImpalaBeeswaxException: Query aborted:
      09:52:03 ----------------------------- Captured stderr call -----------------------------
      09:52:03 -- executing against localhost:21000
      09:52:03 use functional_parquet;
      09:52:03 
      09:52:03 SET disable_codegen=True;
      09:52:03 SET abort_on_error=0;
      09:52:03 SET exec_single_node_rows_threshold=0;
      09:52:03 SET batch_size=0;
      09:52:03 SET num_nodes=0;
      09:52:03 -- executing against localhost:21000
      09:52:03 set num_nodes=1;
      09:52:03 
      09:52:03 -- executing against localhost:21000
      09:52:03 
      09:52:03 set num_scanner_threads=1;
      09:52:03 
      09:52:03 -- executing against localhost:21000
      09:52:03 
      09:52:03 select id, cnt from bad_column_metadata t, (select count(*) cnt from t.int_array) v;
      09:52:03 
      09:52:03 -- executing against localhost:21000
      09:52:03 SET NUM_NODES=0;;
      09:52:03 
      09:52:03 -- executing against localhost:21000
      09:52:03 SET NUM_SCANNER_THREADS=0;;
      09:52:03 
      09:52:03 -- executing against localhost:21000
      09:52:03 set num_nodes=1;
      09:52:03 
      09:52:03 -- executing against localhost:21000
      09:52:03 
      09:52:03 set num_scanner_threads=1;
      09:52:03 
      09:52:03 -- executing against localhost:21000
      09:52:03 
      09:52:03 select id from bad_column_metadata;
      09:52:03 
      09:52:03 -- executing against localhost:21000
      09:52:03 SET NUM_NODES=0;;
      09:52:03 
      09:52:03 -- executing against localhost:21000
      09:52:03 SET NUM_SCANNER_THREADS=0;;
      09:52:03 
      09:52:03 -- executing against localhost:21000
      09:52:03 SELECT * from bad_parquet_strings_negative_len;
      09:52:03 
      09:52:03 -- executing against localhost:21000
      09:52:03 SELECT * from bad_parquet_strings_out_of_bounds;
      09:52:03 
      09:52:03 -- executing against localhost:21000
      09:52:03 use functional_parquet;
      09:52:03 
      09:52:03 SET disable_codegen=True;
      09:52:03 SET abort_on_error=1;
      09:52:03 SET exec_single_node_rows_threshold=0;
      09:52:03 SET batch_size=0;
      09:52:03 SET num_nodes=0;
      09:52:03 -- executing against localhost:21000
      09:52:03 set num_nodes=1;
      09:52:03 
      09:52:03 -- executing against localhost:21000
      09:52:03 
      09:52:03 set num_scanner_threads=1;
      09:52:03 
      09:52:03 -- executing against localhost:21000
      09:52:03 
      09:52:03 select id, cnt from bad_column_metadata t, (select count(*) cnt from t.int_array) v;
      09:52:03 
      09:52:03 -- executing against localhost:21000
      09:52:03 SET NUM_NODES=0;;
      09:52:03 
      09:52:03 -- executing against localhost:21000
      09:52:03 SET NUM_SCANNER_THREADS=0;;
      09:52:03 
      09:52:03 ===== 1 failed, 10942 passed, 139 skipped, 500 xfailed in 15060.87 seconds =====
      

      I was able to reproduce this once on my dev machine. To do so, I altered the affected test like so:

      @SkipIfOldAggsJoins.nested_types
      @pytest.mark.parametrize('multiplier', xrange(32))
      def test_corrupt_files(self, vector, multiplier):
      

      Then I ran it in a loop:

      i=0; while [ $? -eq 0 ]; do ((++i)); echo "Run: $i"; impala-py.test tests/query_test/test_scanners.py::TestParquet::test_corrupt_files --exploration_strategy=exhaustive -n8; done
      

      After a few hours the test failed. From the log files I could not figure out what had happened.

      I also noticed that some queries don't report an error even when they should. For example:

      [localhost:21000] > select id, cnt from functional_parquet.bad_column_metadata t, (select 1 cnt) u;
      Query: select id, cnt from functional_parquet.bad_column_metadata t, (select 1 cnt) u
      Query submitted at: 2017-06-01 22:34:55 (Coordinator: http://lv-desktop:25000)
      Query progress can be monitored at: http://lv-desktop:25000/query_plan?query_id=8f401de1b07987c9:ae67b91d00000000
      WARNINGS: Column metadata states there are 11 values, but read 10 values from column id. file=hdfs://localhost:20500/test-warehouse/bad_column_metadata_parquet/bad_column_metadata.parquet
      

      Without the second subquery a proper error is returned:

      [localhost:21000] > select id from functional_parquet.bad_column_metadata t;
      Query: select id from functional_parquet.bad_column_metadata t
      Query submitted at: 2017-06-01 22:38:03 (Coordinator: http://lv-desktop:25000)
      Query progress can be monitored at: http://lv-desktop:25000/query_plan?query_id=c3456d826947f584:156f07da00000000
      ERROR: Column metadata states there are 11 values, but read 10 values from column id. file=hdfs://localhost:20500/test-warehouse/bad_column_metadata_parquet/bad_column_metadata.parquet
      

        Attachments

          Activity

            People

            • Assignee:
              lv Lars Volker
              Reporter:
              lv Lars Volker
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: