Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-293

Impala is unable to query RCFile tables which describe fewer columns than the file's header.

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: Impala 0.7
    • Fix Version/s: Impala 1.0.1
    • Component/s: None
    • Labels:
      None
    • Environment:
      - Table backed by RCFile
      - Define fewer columns than the file contains
      -- For example 8 columns in the table definition, but 9 columns in the actual RCFile.

      Description

      If you define a table that is backed by RCFile which has fewer columns than the file actually contains, Impala will fail when querying. The failure is relatively graceful, as it will just return 0 results.

      Here is the offending area of code:
      https://github.com/cloudera/impala/blob/master/be/src/exec/hdfs-rcfile-scanner.cc#L199-L203

            if (file_num_cols != num_cols_) {
              stringstream ss;
              ss << "Unexpected hive.io.rcfile.column.number value!"
                 << " Expected: " << num_cols_ << "." << " Read: " << file_num_cols;
              return Status(ss.str());
            }
      
      variable description
      file_num_cols number of columns read from the RCFile header
      num_cols_ number of columns defined in table

      Maybe the evaluation should be instead?

      num_cols_ <= file_num_cols
      

      Here is a step-by-step guide to reproducing.

      [forza-2.cloud.rtp.cloudera.com:21000] > DESCRIBE sightings_rc;
      Query: describe sightings_rc
      Query finished, fetching results ...
      +-------------+--------+---------+
      | name        | type   | comment |
      +-------------+--------+---------+
      | id          | int    |         |
      | sighted_at  | int    |         |
      | reported_at | int    |         |
      | loc         | string |         |
      | shape       | string |         |
      | duration    | string |         |
      | description | string |         |
      | lat         | float  |         |
      | lng         | float  |         |
      +-------------+--------+---------+
      Returned 9 row(s) in 0.01s
      
      [forza-2.cloud.rtp.cloudera.com:21000] > SELECT COUNT(*) FROM sightings_rc;
      Query: select COUNT(*) FROM sightings_rc
      Query finished, fetching results ...
      +----------+
      | count(*) |
      +----------+
      | 60504    |
      +----------+
      Returned 1 row(s) in 0.39s
      [forza-2.cloud.rtp.cloudera.com:21000] > ALTER TABLE sightings_rc DROP COLUMN lng;
      
      [forza-2.cloud.rtp.cloudera.com:21000] > SELECT COUNT(*) FROM sightings_rc;
      Query: select COUNT(*) FROM sightings_rc
      Query finished, fetching results ...
      +----------+
      | count(*) |
      +----------+
      | 0        |
      +----------+
      Returned 1 row(s) in 0.56s
      

      From the logs you can see the same error messages:

      I0419 14:15:49.885937 19655 status.cc:42] Unexpected hive.io.rcfile.column.number value! Expected: 8. Read: 9
      

      Adding the column back returns the table to an operational state.

      [forza-2.cloud.rtp.cloudera.com:21000] > ALTER TABLE sightings_rc ADD COLUMNS (lng FLOAT);
      
      [forza-2.cloud.rtp.cloudera.com:21000] > SELECT COUNT(*) FROM sightings_rc;
      Query: select COUNT(*) FROM sightings_rc
      +----------+
      | count(*) |
      +----------+
      | 60504    |
      +----------+
      Returned 1 row(s) in 0.55s
      

        Attachments

          Activity

            People

            • Assignee:
              skye Skye Wanderman-Milne
              Reporter:
              rickysaltzer Ricky Saltzer
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: