Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
Impala 0.7
-
None
-
None
-
- Table backed by RCFile
- Define fewer columns than the file contains
-- For example 8 columns in the table definition, but 9 columns in the actual RCFile.
Description
If you define a table that is backed by RCFile which has fewer columns than the file actually contains, Impala will fail when querying. The failure is relatively graceful, as it will just return 0 results.
Here is the offending area of code:
https://github.com/cloudera/impala/blob/master/be/src/exec/hdfs-rcfile-scanner.cc#L199-L203
if (file_num_cols != num_cols_) { stringstream ss; ss << "Unexpected hive.io.rcfile.column.number value!" << " Expected: " << num_cols_ << "." << " Read: " << file_num_cols; return Status(ss.str()); }
variable | description |
---|---|
file_num_cols | number of columns read from the RCFile header |
num_cols_ | number of columns defined in table |
Maybe the evaluation should be instead?
num_cols_ <= file_num_cols
Here is a step-by-step guide to reproducing.
[forza-2.cloud.rtp.cloudera.com:21000] > DESCRIBE sightings_rc; Query: describe sightings_rc Query finished, fetching results ... +-------------+--------+---------+ | name | type | comment | +-------------+--------+---------+ | id | int | | | sighted_at | int | | | reported_at | int | | | loc | string | | | shape | string | | | duration | string | | | description | string | | | lat | float | | | lng | float | | +-------------+--------+---------+ Returned 9 row(s) in 0.01s [forza-2.cloud.rtp.cloudera.com:21000] > SELECT COUNT(*) FROM sightings_rc; Query: select COUNT(*) FROM sightings_rc Query finished, fetching results ... +----------+ | count(*) | +----------+ | 60504 | +----------+ Returned 1 row(s) in 0.39s [forza-2.cloud.rtp.cloudera.com:21000] > ALTER TABLE sightings_rc DROP COLUMN lng; [forza-2.cloud.rtp.cloudera.com:21000] > SELECT COUNT(*) FROM sightings_rc; Query: select COUNT(*) FROM sightings_rc Query finished, fetching results ... +----------+ | count(*) | +----------+ | 0 | +----------+ Returned 1 row(s) in 0.56s
From the logs you can see the same error messages:
I0419 14:15:49.885937 19655 status.cc:42] Unexpected hive.io.rcfile.column.number value! Expected: 8. Read: 9
Adding the column back returns the table to an operational state.
[forza-2.cloud.rtp.cloudera.com:21000] > ALTER TABLE sightings_rc ADD COLUMNS (lng FLOAT); [forza-2.cloud.rtp.cloudera.com:21000] > SELECT COUNT(*) FROM sightings_rc; Query: select COUNT(*) FROM sightings_rc +----------+ | count(*) | +----------+ | 60504 | +----------+ Returned 1 row(s) in 0.55s