Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-2109

numRows incorrect when table has TINYINT partitions

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Duplicate
    • Impala 2.2
    • None
    • None

    Description

      Expected behaviour (note the SMALLINT partition):

      [localhost:21000] > CREATE TABLE working (a String) PARTITIONED BY (b SMALLINT);
      [localhost:21000] > INSERT INTO working (a, b) VALUES ("A",1);
      [localhost:21000] > INSERT INTO working (a, b) VALUES ("B",1);
      [localhost:21000] > INSERT INTO working (a, b) VALUES ("C",2);
      [localhost:21000] > COMPUTE STATS working;
      [localhost:21000] > SHOW TABLE STATS working;
      Query: show TABLE STATS working
      +-------+-------+--------+------+--------------+-------------------+--------+-------------------+
      | b     | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format | Incremental stats |
      +-------+-------+--------+------+--------------+-------------------+--------+-------------------+
      | 1     | 2     | 2      | 4B   | NOT CACHED   | NOT CACHED        | TEXT   | false             |
      | 2     | 1     | 1      | 2B   | NOT CACHED   | NOT CACHED        | TEXT   | false             |
      | Total | 3     | 3      | 6B   | 0B           |                   |        |                   |
      +-------+-------+--------+------+--------------+-------------------+--------+-------------------+
      

      Now the same steps with a TINYINT partition type:

      [localhost:21000] > CREATE TABLE broken (a String) PARTITIONED BY (b TINYINT);
      [localhost:21000] > INSERT INTO broken (a, b) VALUES ("A",1);
      [localhost:21000] > INSERT INTO broken (a, b) VALUES ("B",1);
      [localhost:21000] > INSERT INTO broken (a, b) VALUES ("C",2);
      [localhost:21000] > COMPUTE STATS broken;
      [localhost:21000] > SHOW TABLE STATS broken;
      Query: show TABLE STATS broken
      +-------+-------+--------+------+--------------+-------------------+--------+-------------------+
      | b     | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format | Incremental stats |
      +-------+-------+--------+------+--------------+-------------------+--------+-------------------+
      | 1     | 0     | 2      | 4B   | NOT CACHED   | NOT CACHED        | TEXT   | false             |
      | 2     | 0     | 1      | 2B   | NOT CACHED   | NOT CACHED        | TEXT   | false             |
      | Total | 3     | 3      | 6B   | 0B           |                   |        |                   |
      +-------+-------+--------+------+--------------+-------------------+--------+-------------------+
      

      Notice that all the partitions have numRows=0. The incorrect number of rows can negatively impact the query planning.

      Attachments

        Activity

          People

            sailesh Sailesh Mukil
            rhydomako_impala_384e Richard Hydomako
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: