[IMPALA-5287] Add a test for skip.header.line.count on compressed files - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: Impala 2.9.0
Fix Version/s: Impala 2.9.0
Component/s: Backend
Labels:
None

Target Version:

Impala 2.9.0
Epic Color:
ghx-label-6

Description

Before the fix for ~~IMPALA-3905~~ was merged, the HDFS text scanner initialized the decompressor after finding the first row. This was wrong, but not an issue for normal compressed tables, since for those we only issue a single scan range, ant therefore can skip searching for the first newline character.

However, this broke skipping header lines at the beginning of compressed files. We should add a test for skip.header.line.count on compressed files to prevent a regression in the future.

Attachments

Issue Links

depends upon

IMPALA-4615 test_avro_schema_resolution.py fails with wrong results

Resolved

is related to

IMPALA-5193 Impala reads gzip compressed text as binary when skip.header.line.count > 0

Resolved

Activity

People

Assignee:: Lars Volker

Reporter:: Lars Volker

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 06/May/17 18:55

Updated:: 07/Aug/17 22:17

Resolved:: 09/May/17 11:26