Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4173

Stats computation on fixed length char columns still tries to compute Max & Avg column length

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Won't Fix
    • Impala 2.5.0
    • None
    • Frontend

    Description

      For fixed length char column the query issued to compute the statistics tried to calculate Max and Average column length which is already known from the column data type.

      Query issued

      SELECT NDV(l_orderkey) AS l_orderkey, CAST(-1 as BIGINT), MAX(length(l_orderkey)), AVG(length(l_orderkey)) FROM tpch_3000_parquet.l_orderkey_char_11
      

      Table schema

      describe  tpch_3000_parquet.l_orderkey_char_11;
      +------------+----------+---------+
      | name       | type     | comment |
      +------------+----------+---------+
      | l_orderkey | char(11) |         |
      +------------+----------+---------+
      

      Table statistics, not clean why the Avg size is not 11, I guess an overflow might be happening somewhere as row count is 18 Billion.

      +------------+----------+------------------+--------+----------+-------------------+
      | Column     | Type     | #Distinct Values | #Nulls | Max Size | Avg Size          |
      +------------+----------+------------------+--------+----------+-------------------+
      | l_orderkey | CHAR(11) | 4525097984       | -1     | 11       | 10.38269996643066 |
      +------------+----------+------------------+--------+----------+-------------------+
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            mmokhtar Mostafa Mokhtar
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: