Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
Impala 2.9.0
-
None
-
ghx-label-6
Description
Before the fix for IMPALA-3905 was merged, the HDFS text scanner initialized the decompressor after finding the first row. This was wrong, but not an issue for normal compressed tables, since for those we only issue a single scan range, ant therefore can skip searching for the first newline character.
However, this broke skipping header lines at the beginning of compressed files. We should add a test for skip.header.line.count on compressed files to prevent a regression in the future.
Attachments
Issue Links
- depends upon
-
IMPALA-4615 test_avro_schema_resolution.py fails with wrong results
- Resolved
- is related to
-
IMPALA-5193 Impala reads gzip compressed text as binary when skip.header.line.count > 0
- Resolved