Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
Impala 2.9.0
-
ghx-label-5
Description
We've seen this error once:
09:52:03 FAIL query_test/test_scanners.py::TestParquet::()::test_corrupt_files[exec_option: {'disable_codegen': True, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: parquet/none] 09:52:03 =================================== FAILURES =================================== 09:52:03 TestParquet.test_corrupt_files[exec_option: {'disable_codegen': True, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: parquet/none] 09:52:03 [gw2] linux2 -- Python 2.6.6 /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/bin/../infra/python/env/bin/python 09:52:03 query_test/test_scanners.py:254: in test_corrupt_files 09:52:03 self.run_test_case('QueryTest/parquet-abort-on-error', vector) 09:52:03 common/impala_test_suite.py:374: in run_test_case 09:52:03 self.__verify_exceptions(test_section['CATCH'], str(e), use_db) 09:52:03 common/impala_test_suite.py:253: in __verify_exceptions 09:52:03 (expected_str, actual_str) 09:52:03 E AssertionError: Unexpected exception string. Expected: Column metadata states there are 50 values, but read 100 values from column element. 09:52:03 E Not found in actual: ImpalaBeeswaxException: Query aborted: 09:52:03 ----------------------------- Captured stderr call ----------------------------- 09:52:03 -- executing against localhost:21000 09:52:03 use functional_parquet; 09:52:03 09:52:03 SET disable_codegen=True; 09:52:03 SET abort_on_error=0; 09:52:03 SET exec_single_node_rows_threshold=0; 09:52:03 SET batch_size=0; 09:52:03 SET num_nodes=0; 09:52:03 -- executing against localhost:21000 09:52:03 set num_nodes=1; 09:52:03 09:52:03 -- executing against localhost:21000 09:52:03 09:52:03 set num_scanner_threads=1; 09:52:03 09:52:03 -- executing against localhost:21000 09:52:03 09:52:03 select id, cnt from bad_column_metadata t, (select count(*) cnt from t.int_array) v; 09:52:03 09:52:03 -- executing against localhost:21000 09:52:03 SET NUM_NODES=0;; 09:52:03 09:52:03 -- executing against localhost:21000 09:52:03 SET NUM_SCANNER_THREADS=0;; 09:52:03 09:52:03 -- executing against localhost:21000 09:52:03 set num_nodes=1; 09:52:03 09:52:03 -- executing against localhost:21000 09:52:03 09:52:03 set num_scanner_threads=1; 09:52:03 09:52:03 -- executing against localhost:21000 09:52:03 09:52:03 select id from bad_column_metadata; 09:52:03 09:52:03 -- executing against localhost:21000 09:52:03 SET NUM_NODES=0;; 09:52:03 09:52:03 -- executing against localhost:21000 09:52:03 SET NUM_SCANNER_THREADS=0;; 09:52:03 09:52:03 -- executing against localhost:21000 09:52:03 SELECT * from bad_parquet_strings_negative_len; 09:52:03 09:52:03 -- executing against localhost:21000 09:52:03 SELECT * from bad_parquet_strings_out_of_bounds; 09:52:03 09:52:03 -- executing against localhost:21000 09:52:03 use functional_parquet; 09:52:03 09:52:03 SET disable_codegen=True; 09:52:03 SET abort_on_error=1; 09:52:03 SET exec_single_node_rows_threshold=0; 09:52:03 SET batch_size=0; 09:52:03 SET num_nodes=0; 09:52:03 -- executing against localhost:21000 09:52:03 set num_nodes=1; 09:52:03 09:52:03 -- executing against localhost:21000 09:52:03 09:52:03 set num_scanner_threads=1; 09:52:03 09:52:03 -- executing against localhost:21000 09:52:03 09:52:03 select id, cnt from bad_column_metadata t, (select count(*) cnt from t.int_array) v; 09:52:03 09:52:03 -- executing against localhost:21000 09:52:03 SET NUM_NODES=0;; 09:52:03 09:52:03 -- executing against localhost:21000 09:52:03 SET NUM_SCANNER_THREADS=0;; 09:52:03 09:52:03 ===== 1 failed, 10942 passed, 139 skipped, 500 xfailed in 15060.87 seconds =====
I was able to reproduce this once on my dev machine. To do so, I altered the affected test like so:
@SkipIfOldAggsJoins.nested_types
@pytest.mark.parametrize('multiplier', xrange(32))
def test_corrupt_files(self, vector, multiplier):
Then I ran it in a loop:
i=0; while [ $? -eq 0 ]; do ((++i)); echo "Run: $i"; impala-py.test tests/query_test/test_scanners.py::TestParquet::test_corrupt_files --exploration_strategy=exhaustive -n8; done
After a few hours the test failed. From the log files I could not figure out what had happened.
I also noticed that some queries don't report an error even when they should. For example:
[localhost:21000] > select id, cnt from functional_parquet.bad_column_metadata t, (select 1 cnt) u; Query: select id, cnt from functional_parquet.bad_column_metadata t, (select 1 cnt) u Query submitted at: 2017-06-01 22:34:55 (Coordinator: http://lv-desktop:25000) Query progress can be monitored at: http://lv-desktop:25000/query_plan?query_id=8f401de1b07987c9:ae67b91d00000000 WARNINGS: Column metadata states there are 11 values, but read 10 values from column id. file=hdfs://localhost:20500/test-warehouse/bad_column_metadata_parquet/bad_column_metadata.parquet
Without the second subquery a proper error is returned:
[localhost:21000] > select id from functional_parquet.bad_column_metadata t; Query: select id from functional_parquet.bad_column_metadata t Query submitted at: 2017-06-01 22:38:03 (Coordinator: http://lv-desktop:25000) Query progress can be monitored at: http://lv-desktop:25000/query_plan?query_id=c3456d826947f584:156f07da00000000 ERROR: Column metadata states there are 11 values, but read 10 values from column id. file=hdfs://localhost:20500/test-warehouse/bad_column_metadata_parquet/bad_column_metadata.parquet