Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
Impala 4.0.0
-
ghx-label-5
Description
The fix for IMPALA-10005 added a new TestCompressedNonText test. It relies on Hive generating specific file names when writing these compressed tables (i.e. it expects a file named 000000_0). It looks like that is not guaranteed by dataload, which can lead to failures like this:
query_test/test_compressed_formats.py:142: in test_insensitivity_to_extension unique_database, 'tinytable', db_suffix, '000000_0', src_extension, ext) query_test/test_compressed_formats.py:86: in _copy_and_query_compressed_file self.filesystem_client.copy(src_file, dest_file, overwrite=True) util/hdfs_util.py:79: in copy self.hdfs_filesystem_client.copy(src, dst, overwrite) util/hdfs_util.py:241: in copy '{0} copy failed: '.format(self.filesystem_type) + stderr + "; " + stdout E AssertionError: HDFS copy failed: cp: `/test-warehouse/tinytable_avro_snap/000000_0': No such file or directory E ;
The file list shows that the filename is actually "/test-warehouse/tinytable_avro_snap/000000_1"
We should update the test to tolerate this. The actual base filename doesn't matter for this test.
I have seen this exactly once so far.