Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-293

Impala is unable to query RCFile tables which describe fewer columns than the file's header.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • Impala 0.7
    • Impala 1.0.1
    • None
    • None
    • - Table backed by RCFile
      - Define fewer columns than the file contains
      -- For example 8 columns in the table definition, but 9 columns in the actual RCFile.

    Description

      If you define a table that is backed by RCFile which has fewer columns than the file actually contains, Impala will fail when querying. The failure is relatively graceful, as it will just return 0 results.

      Here is the offending area of code:
      https://github.com/cloudera/impala/blob/master/be/src/exec/hdfs-rcfile-scanner.cc#L199-L203

            if (file_num_cols != num_cols_) {
              stringstream ss;
              ss << "Unexpected hive.io.rcfile.column.number value!"
                 << " Expected: " << num_cols_ << "." << " Read: " << file_num_cols;
              return Status(ss.str());
            }
      
      variable description
      file_num_cols number of columns read from the RCFile header
      num_cols_ number of columns defined in table

      Maybe the evaluation should be instead?

      num_cols_ <= file_num_cols
      

      Here is a step-by-step guide to reproducing.

      [forza-2.cloud.rtp.cloudera.com:21000] > DESCRIBE sightings_rc;
      Query: describe sightings_rc
      Query finished, fetching results ...
      +-------------+--------+---------+
      | name        | type   | comment |
      +-------------+--------+---------+
      | id          | int    |         |
      | sighted_at  | int    |         |
      | reported_at | int    |         |
      | loc         | string |         |
      | shape       | string |         |
      | duration    | string |         |
      | description | string |         |
      | lat         | float  |         |
      | lng         | float  |         |
      +-------------+--------+---------+
      Returned 9 row(s) in 0.01s
      
      [forza-2.cloud.rtp.cloudera.com:21000] > SELECT COUNT(*) FROM sightings_rc;
      Query: select COUNT(*) FROM sightings_rc
      Query finished, fetching results ...
      +----------+
      | count(*) |
      +----------+
      | 60504    |
      +----------+
      Returned 1 row(s) in 0.39s
      [forza-2.cloud.rtp.cloudera.com:21000] > ALTER TABLE sightings_rc DROP COLUMN lng;
      
      [forza-2.cloud.rtp.cloudera.com:21000] > SELECT COUNT(*) FROM sightings_rc;
      Query: select COUNT(*) FROM sightings_rc
      Query finished, fetching results ...
      +----------+
      | count(*) |
      +----------+
      | 0        |
      +----------+
      Returned 1 row(s) in 0.56s
      

      From the logs you can see the same error messages:

      I0419 14:15:49.885937 19655 status.cc:42] Unexpected hive.io.rcfile.column.number value! Expected: 8. Read: 9
      

      Adding the column back returns the table to an operational state.

      [forza-2.cloud.rtp.cloudera.com:21000] > ALTER TABLE sightings_rc ADD COLUMNS (lng FLOAT);
      
      [forza-2.cloud.rtp.cloudera.com:21000] > SELECT COUNT(*) FROM sightings_rc;
      Query: select COUNT(*) FROM sightings_rc
      +----------+
      | count(*) |
      +----------+
      | 60504    |
      +----------+
      Returned 1 row(s) in 0.55s
      

      Attachments

        Activity

          People

            skye Skye Wanderman-Milne
            rickysaltzer Ricky Saltzer
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: