Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-1988

show column stats returns different results for beeswax and hs2

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • Impala 2.2
    • None
    • Clients

    Description

      In the impala shell show column stats compute_stats_db.alltypes functions as expected; stats are returned. When the same command is executed through impyla using hs2, the second to last column, Max Size, is always of None type.

      To reproduce:
      in the Impala shell
      [localhost:21000] > show column stats compute_stats_db.alltypes;
      Query: show column stats compute_stats_db.alltypes
      ----------------------------------------------------------------+

      Column Type #Distinct Values #Nulls Max Size Avg Size

      ----------------------------------------------------------------+

      id INT 8161 -1 4 4
      bool_col BOOLEAN 2 -1 1 1
      tinyint_col TINYINT 10 -1 1 1
      smallint_col SMALLINT 10 -1 2 2
      int_col INT 10 -1 4 4
      bigint_col BIGINT 10 -1 8 8
      float_col FLOAT 10 -1 4 4
      double_col DOUBLE 10 -1 8 8
      date_string_col STRING 666 -1 8 8
      string_col STRING 10 -1 1 1
      timestamp_col TIMESTAMP 5678 -1 16 16
      year INT 2 0 4 4
      month INT 12 0 4 4

      ----------------------------------------------------------------+
      Fetched 13 row(s) in 0.01s

      In ipython (normal python also works fine for this):

      In [1]: from impala.dbapi import connect

      In [2]: conn = connect()

      In [3]: cur = conn.cursor()

      In [4]: cur.execute("show column stats compute_stats_db.alltypes")

      In [5]: cur.fetchall()
      Out[5]:
      [('id', 'INT', 8161, -1, None, 4.0),
      ('bool_col', 'BOOLEAN', 2, -1, None, 1.0),
      ('tinyint_col', 'TINYINT', 10, -1, None, 1.0),
      ('smallint_col', 'SMALLINT', 10, -1, None, 2.0),
      ('int_col', 'INT', 10, -1, None, 4.0),
      ('bigint_col', 'BIGINT', 10, -1, None, 8.0),
      ('float_col', 'FLOAT', 10, -1, None, 4.0),
      ('double_col', 'DOUBLE', 10, -1, None, 8.0),
      ('date_string_col', 'STRING', 666, -1, None, 8.0),
      ('string_col', 'STRING', 10, -1, None, 1.0),
      ('timestamp_col', 'TIMESTAMP', 5678, -1, None, 16.0),
      ('year', 'INT', 2, 0, None, 4.0),
      ('month', 'INT', 12, 0, None, 4.0)]

      In [6]: cur.description
      Out[6]:
      [('Column', 'STRING', None, None, None, None, None),
      ('Type', 'STRING', None, None, None, None, None),
      ('#Distinct Values', 'BIGINT', None, None, None, None, None),
      ('#Nulls', 'BIGINT', None, None, None, None, None),
      ('Max Size', 'INT', None, None, None, None, None),
      ('Avg Size', 'DOUBLE', None, None, None, None, None)]

      For those unfamiliar with impyla:
      fetchall() return the query results; each tuple is a row.
      description returns the column labels and types, e.g. the first column is named Column and is of type string.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              alex.leblang_impala_e0fc Alex Leblang
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: