Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4585

test_udfs.py fails on S3 and local filesystem builds

    Details

      Description

      Looks like a test infra problem with replacing the special _HDFS_FILENAME_ marker.

      Michael, might this be related to your recent fix to the test framework?

      Jenkins console snippet:

      04:05:38 =================================== FAILURES ===================================
      04:05:38  TestUdfs.test_udf_errors[exec_option: {'disable_codegen': True, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: text/none] 
      04:05:38 [gw2] linux2 -- Python 2.6.6 /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/bin/../infra/python/env/bin/python
      04:05:38 query_test/test_udfs.py:79: in test_udf_errors
      04:05:38     self.run_test_case('QueryTest/udf-errors', vector, use_db=unique_database)
      04:05:38 common/impala_test_suite.py:327: in run_test_case
      04:05:38     self.__verify_exceptions(test_section['CATCH'], str(e), use_db)
      04:05:38 common/impala_test_suite.py:218: in __verify_exceptions
      04:05:38     (expected_str, actual_str)
      04:05:38 E   AssertionError: Unexpected exception string. Expected: Could not load binary: file:/tmp/test-warehouse/not-a-real-file.so
      04:05:38 E   Not found in actual: ImpalaBeeswaxException: INNER EXCEPTION: <class 'beeswaxd.ttypes.BeeswaxException'> MESSAGE: AnalysisException: Could not load binary: __HDFS_FILENAME__(2): No such file or directory
      04:05:38 ---------------------------- Captured stderr setup -----------------------------
      04:05:38 -- connecting to: localhost:21000
      04:05:38 SET sync_ddl=False;
      04:05:38 -- executing against localhost:21000
      04:05:38 DROP DATABASE IF EXISTS `test_udf_errors_cace194a` CASCADE;
      04:05:38 
      04:05:38 SET sync_ddl=False;
      04:05:38 -- executing against localhost:21000
      04:05:38 CREATE DATABASE `test_udf_errors_cace194a`;
      04:05:38 
      04:05:38 MainThread: Created database "test_udf_errors_cace194a" for test ID "query_test/test_udfs.py::TestUdfs::()::test_udf_errors[exec_option: {'disable_codegen': True, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: text/none]"
      04:05:38 ----------------------------- Captured stderr call -----------------------------
      04:05:38 -- executing against localhost:21000
      04:05:38 use test_udf_errors_cace194a;
      04:05:38 
      04:05:38 SET disable_codegen=1;
      04:05:38 SET abort_on_error=1;
      04:05:38 SET exec_single_node_rows_threshold=0;
      04:05:38 SET batch_size=0;
      04:05:38 SET num_nodes=0;
      04:05:38 -- executing against localhost:21000
      04:05:38 create function if not exists hive_pi() returns double
      04:05:38 location 'file:/tmp/test-warehouse/hive-exec.jar'
      04:05:38 symbol='org.apache.hadoop.hive.ql.udf.UDFPI';
      04:05:38 
      04:05:38 -- executing against localhost:21000
      04:05:38 create function if not exists foo() returns double
      04:05:38 location 'file:/tmp/test-warehouse/not-a-real-file.so'
      04:05:38 symbol='FnDoesNotExist';
      04:05:38 
      04:05:38  TestUdfs.test_udf_errors[exec_option: {'disable_codegen': True, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 100, 'batch_size': 0, 'num_nodes': 0} | table_format: text/none] 
      04:05:38 [gw2] linux2 -- Python 2.6.6 /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/bin/../infra/python/env/bin/python
      04:05:38 query_test/test_udfs.py:79: in test_udf_errors
      04:05:38     self.run_test_case('QueryTest/udf-errors', vector, use_db=unique_database)
      04:05:38 common/impala_test_suite.py:327: in run_test_case
      04:05:38     self.__verify_exceptions(test_section['CATCH'], str(e), use_db)
      04:05:38 common/impala_test_suite.py:218: in __verify_exceptions
      04:05:38     (expected_str, actual_str)
      04:05:38 E   AssertionError: Unexpected exception string. Expected: Could not load binary: file:/tmp/test-warehouse/not-a-real-file.so
      04:05:38 E   Not found in actual: ImpalaBeeswaxException: INNER EXCEPTION: <class 'beeswaxd.ttypes.BeeswaxException'> MESSAGE: AnalysisException: Could not load binary: __HDFS_FILENAME__(2): No such file or directory
      04:05:38 ---------------------------- Captured stderr setup -----------------------------
      04:05:38 SET sync_ddl=False;
      04:05:38 -- executing against localhost:21000
      04:05:38 DROP DATABASE IF EXISTS `test_udf_errors_39fb7221` CASCADE;
      04:05:38 
      04:05:38 SET sync_ddl=False;
      04:05:38 -- executing against localhost:21000
      04:05:38 CREATE DATABASE `test_udf_errors_39fb7221`;
      04:05:38 
      04:05:38 MainThread: Created database "test_udf_errors_39fb7221" for test ID "query_test/test_udfs.py::TestUdfs::()::test_udf_errors[exec_option: {'disable_codegen': True, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 100, 'batch_size': 0, 'num_nodes': 0} | table_format: text/none]"
      04:05:38 ----------------------------- Captured stderr call -----------------------------
      04:05:38 -- executing against localhost:21000
      04:05:38 use test_udf_errors_39fb7221;
      04:05:38 
      04:05:38 SET disable_codegen=1;
      04:05:38 SET abort_on_error=1;
      04:05:38 SET exec_single_node_rows_threshold=100;
      04:05:38 SET batch_size=0;
      04:05:38 SET num_nodes=0;
      04:05:38 -- executing against localhost:21000
      04:05:38 create function if not exists hive_pi() returns double
      04:05:38 location 'file:/tmp/test-warehouse/hive-exec.jar'
      04:05:38 symbol='org.apache.hadoop.hive.ql.udf.UDFPI';
      04:05:38 
      04:05:38 -- executing against localhost:21000
      04:05:38 create function if not exists foo() returns double
      04:05:38 location 'file:/tmp/test-warehouse/not-a-real-file.so'
      04:05:38 symbol='FnDoesNotExist';
      04:05:38 
      04:05:38  TestUdfs.test_udf_errors[exec_option: {'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: text/none] 
      04:05:38 [gw1] linux2 -- Python 2.6.6 /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/bin/../infra/python/env/bin/python
      04:05:38 query_test/test_udfs.py:79: in test_udf_errors
      04:05:38     self.run_test_case('QueryTest/udf-errors', vector, use_db=unique_database)
      04:05:38 common/impala_test_suite.py:327: in run_test_case
      04:05:38     self.__verify_exceptions(test_section['CATCH'], str(e), use_db)
      04:05:38 common/impala_test_suite.py:218: in __verify_exceptions
      04:05:38     (expected_str, actual_str)
      04:05:38 E   AssertionError: Unexpected exception string. Expected: Could not load binary: file:/tmp/test-warehouse/not-a-real-file.so
      04:05:38 E   Not found in actual: ImpalaBeeswaxException: INNER EXCEPTION: <class 'beeswaxd.ttypes.BeeswaxException'> MESSAGE: AnalysisException: Could not load binary: __HDFS_FILENAME__(2): No such file or directory
      04:05:38 ---------------------------- Captured stderr setup -----------------------------
      04:05:38 SET sync_ddl=False;
      04:05:38 -- executing against localhost:21000
      04:05:38 DROP DATABASE IF EXISTS `test_udf_errors_a0ea005b` CASCADE;
      04:05:38 
      04:05:38 SET sync_ddl=False;
      04:05:38 -- executing against localhost:21000
      04:05:38 CREATE DATABASE `test_udf_errors_a0ea005b`;
      04:05:38 
      04:05:38 MainThread: Created database "test_udf_errors_a0ea005b" for test ID "query_test/test_udfs.py::TestUdfs::()::test_udf_errors[exec_option: {'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0, 'batch_size': 0, 'num_nodes': 0} | table_format: text/none]"
      04:05:38 ----------------------------- Captured stderr call -----------------------------
      04:05:38 -- executing against localhost:21000
      04:05:38 use test_udf_errors_a0ea005b;
      04:05:38 
      04:05:38 SET disable_codegen=1;
      04:05:38 SET abort_on_error=1;
      04:05:38 SET exec_single_node_rows_threshold=0;
      04:05:38 SET batch_size=0;
      04:05:38 SET num_nodes=0;
      04:05:38 -- executing against localhost:21000
      04:05:38 create function if not exists hive_pi() returns double
      04:05:38 location 'file:/tmp/test-warehouse/hive-exec.jar'
      04:05:38 symbol='org.apache.hadoop.hive.ql.udf.UDFPI';
      04:05:38 
      04:05:38 -- executing against localhost:21000
      04:05:38 create function if not exists foo() returns double
      04:05:38 location 'file:/tmp/test-warehouse/not-a-real-file.so'
      04:05:38 symbol='FnDoesNotExist';
      04:05:38 
      04:05:38  TestUdfs.test_udf_errors[exec_option: {'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 100, 'batch_size': 0, 'num_nodes': 0} | table_format: text/none] 
      04:05:38 [gw1] linux2 -- Python 2.6.6 /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/bin/../infra/python/env/bin/python
      04:05:38 query_test/test_udfs.py:79: in test_udf_errors
      04:05:38     self.run_test_case('QueryTest/udf-errors', vector, use_db=unique_database)
      04:05:38 common/impala_test_suite.py:327: in run_test_case
      04:05:38     self.__verify_exceptions(test_section['CATCH'], str(e), use_db)
      04:05:38 common/impala_test_suite.py:218: in __verify_exceptions
      04:05:38     (expected_str, actual_str)
      04:05:38 E   AssertionError: Unexpected exception string. Expected: Could not load binary: file:/tmp/test-warehouse/not-a-real-file.so
      04:05:38 E   Not found in actual: ImpalaBeeswaxException: INNER EXCEPTION: <class 'beeswaxd.ttypes.BeeswaxException'> MESSAGE: AnalysisException: Could not load binary: __HDFS_FILENAME__(2): No such file or directory
      04:05:38 ---------------------------- Captured stderr setup -----------------------------
      04:05:38 SET sync_ddl=False;
      04:05:38 -- executing against localhost:21000
      04:05:38 DROP DATABASE IF EXISTS `test_udf_errors_e4993d5` CASCADE;
      04:05:38 
      04:05:38 SET sync_ddl=False;
      04:05:38 -- executing against localhost:21000
      04:05:38 CREATE DATABASE `test_udf_errors_e4993d5`;
      04:05:38 
      04:05:38 MainThread: Created database "test_udf_errors_e4993d5" for test ID "query_test/test_udfs.py::TestUdfs::()::test_udf_errors[exec_option: {'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 100, 'batch_size': 0, 'num_nodes': 0} | table_format: text/none]"
      04:05:38 ----------------------------- Captured stderr call -----------------------------
      04:05:38 -- executing against localhost:21000
      04:05:38 use test_udf_errors_e4993d5;
      04:05:38 
      04:05:38 SET disable_codegen=1;
      04:05:38 SET abort_on_error=1;
      04:05:38 SET exec_single_node_rows_threshold=100;
      04:05:38 SET batch_size=0;
      04:05:38 SET num_nodes=0;
      04:05:38 -- executing against localhost:21000
      04:05:38 create function if not exists hive_pi() returns double
      04:05:38 location 'file:/tmp/test-warehouse/hive-exec.jar'
      04:05:38 symbol='org.apache.hadoop.hive.ql.udf.UDFPI';
      04:05:38 
      04:05:38 -- executing against localhost:21000
      04:05:38 create function if not exists foo() returns double
      04:05:38 location 'file:/tmp/test-warehouse/not-a-real-file.so'
      04:05:38 symbol='FnDoesNotExist';
      04:05:38 
      04:05:38  generated xml file: /data/jenkins/workspace/impala-umbrella-build-and-test/repos/Impala/logs/ee_tests/results/TEST-impala-parallel.xml 
      04:05:38 =========================== short test summary info ============================
      

        Activity

        Hide
        kwho Michael Ho added a comment -

        From what I can tell, this is not related but I am double checking. The part I changed recently only kicks in when an expected exception doesn't happen. It appears to me that we replaced the $FILESYSTEM_PREFIX in the expected exception string correctly. The problem seems to be with the actual exception string returned by Impala.

        Show
        kwho Michael Ho added a comment - From what I can tell, this is not related but I am double checking. The part I changed recently only kicks in when an expected exception doesn't happen. It appears to me that we replaced the $FILESYSTEM_PREFIX in the expected exception string correctly. The problem seems to be with the actual exception string returned by Impala.
        Hide
        kwho Michael Ho added a comment -

        I think the problem has to do with the following commit:

        commit 858f5c219710f1b72b25e509643f0cf9e1113dee
        Author: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
        Date:   Fri Nov 4 17:12:04 2016 -0700
        
            IMPALA-4363: Add Parquet timestamp validation
        
            Before this patch, we would simply read the INT96 Parquet timestamp
            representation and assume that it's valid. However, not all bit
            permutations represent a valid timestamp. One of the boost functions
            raised an exception (that we didn't catch) when passed an invalid
            boost date object, which resulted in a crash. This patch fixes
            problem by validating that the date falls into 1400..9999 year
            range as we are scanning Parquet.
        
            Change-Id: Ieaab5d33e6f0df831d0e67e1d318e5416ffb90ac
            Reviewed-on: http://gerrit.cloudera.org:8080/5343
            Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
            Tested-by: Internal Jenkins
        

        In particular, the following line of change caused the breakage above. The change may need to carry out similar replacement
        for the expected_str too.

        -    actual_str = actual_str.replace('\n', '')
        +    actual_str = ''.join(apply_error_match_filter([actual_str.replace('\n', '')]))
        
        Show
        kwho Michael Ho added a comment - I think the problem has to do with the following commit: commit 858f5c219710f1b72b25e509643f0cf9e1113dee Author: Taras Bobrovytsky <tbobrovytsky@cloudera.com> Date: Fri Nov 4 17:12:04 2016 -0700 IMPALA-4363: Add Parquet timestamp validation Before this patch, we would simply read the INT96 Parquet timestamp representation and assume that it's valid. However, not all bit permutations represent a valid timestamp. One of the boost functions raised an exception (that we didn't catch) when passed an invalid boost date object, which resulted in a crash. This patch fixes problem by validating that the date falls into 1400..9999 year range as we are scanning Parquet. Change-Id: Ieaab5d33e6f0df831d0e67e1d318e5416ffb90ac Reviewed-on: http://gerrit.cloudera.org:8080/5343 Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com> Tested-by: Internal Jenkins In particular, the following line of change caused the breakage above. The change may need to carry out similar replacement for the expected_str too. - actual_str = actual_str.replace('\n', '') + actual_str = ''.join(apply_error_match_filter([actual_str.replace('\n', '')]))
        Hide
        tarasbob Taras Bobrovytsky added a comment -
        commit 1083639ff2a09ff157d3e8c6880973d954a20bb9
        Author: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
        Date:   Mon Dec 5 11:38:27 2016 -0800
        
            IMPALA-4585: Allow the $DATABASE template in the CATCH section
        
            In a recent change (IMPALA-4363) we introduced a change where all file
            paths in .test files should be replaced with '__HDFS_FILENAME__'. This
            caused problems for tests on non-HDFS file systems and we also lost some
            test coverage. This patch fixes the problem by allowing the $DATABASE
            template in the catch section of the .test file.
        
            Change-Id: If0f6ae8dea7ac4cdaf0c61ebd8f0c589c353a96e
            Reviewed-on: http://gerrit.cloudera.org:8080/5372
            Reviewed-by: Dan Hecht <dhecht@cloudera.com>
            Tested-by: Impala Public Jenkins
        
        Show
        tarasbob Taras Bobrovytsky added a comment - commit 1083639ff2a09ff157d3e8c6880973d954a20bb9 Author: Taras Bobrovytsky <tbobrovytsky@cloudera.com> Date: Mon Dec 5 11:38:27 2016 -0800 IMPALA-4585: Allow the $DATABASE template in the CATCH section In a recent change (IMPALA-4363) we introduced a change where all file paths in .test files should be replaced with '__HDFS_FILENAME__'. This caused problems for tests on non-HDFS file systems and we also lost some test coverage. This patch fixes the problem by allowing the $DATABASE template in the catch section of the .test file. Change-Id: If0f6ae8dea7ac4cdaf0c61ebd8f0c589c353a96e Reviewed-on: http: //gerrit.cloudera.org:8080/5372 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Impala Public Jenkins

          People

          • Assignee:
            tarasbob Taras Bobrovytsky
            Reporter:
            alex.behm Alexander Behm
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development