Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
Impala 2.2
-
None
Description
In the impala shell show column stats compute_stats_db.alltypes functions as expected; stats are returned. When the same command is executed through impyla using hs2, the second to last column, Max Size, is always of None type.
To reproduce:
in the Impala shell
[localhost:21000] > show column stats compute_stats_db.alltypes;
Query: show column stats compute_stats_db.alltypes
----------------------------------------------------------------+
Column | Type | #Distinct Values | #Nulls | Max Size | Avg Size |
----------------------------------------------------------------+
id | INT | 8161 | -1 | 4 | 4 |
bool_col | BOOLEAN | 2 | -1 | 1 | 1 |
tinyint_col | TINYINT | 10 | -1 | 1 | 1 |
smallint_col | SMALLINT | 10 | -1 | 2 | 2 |
int_col | INT | 10 | -1 | 4 | 4 |
bigint_col | BIGINT | 10 | -1 | 8 | 8 |
float_col | FLOAT | 10 | -1 | 4 | 4 |
double_col | DOUBLE | 10 | -1 | 8 | 8 |
date_string_col | STRING | 666 | -1 | 8 | 8 |
string_col | STRING | 10 | -1 | 1 | 1 |
timestamp_col | TIMESTAMP | 5678 | -1 | 16 | 16 |
year | INT | 2 | 0 | 4 | 4 |
month | INT | 12 | 0 | 4 | 4 |
----------------------------------------------------------------+
Fetched 13 row(s) in 0.01s
In ipython (normal python also works fine for this):
In [1]: from impala.dbapi import connect
In [2]: conn = connect()
In [3]: cur = conn.cursor()
In [4]: cur.execute("show column stats compute_stats_db.alltypes")
In [5]: cur.fetchall()
Out[5]:
[('id', 'INT', 8161, -1, None, 4.0),
('bool_col', 'BOOLEAN', 2, -1, None, 1.0),
('tinyint_col', 'TINYINT', 10, -1, None, 1.0),
('smallint_col', 'SMALLINT', 10, -1, None, 2.0),
('int_col', 'INT', 10, -1, None, 4.0),
('bigint_col', 'BIGINT', 10, -1, None, 8.0),
('float_col', 'FLOAT', 10, -1, None, 4.0),
('double_col', 'DOUBLE', 10, -1, None, 8.0),
('date_string_col', 'STRING', 666, -1, None, 8.0),
('string_col', 'STRING', 10, -1, None, 1.0),
('timestamp_col', 'TIMESTAMP', 5678, -1, None, 16.0),
('year', 'INT', 2, 0, None, 4.0),
('month', 'INT', 12, 0, None, 4.0)]
In [6]: cur.description
Out[6]:
[('Column', 'STRING', None, None, None, None, None),
('Type', 'STRING', None, None, None, None, None),
('#Distinct Values', 'BIGINT', None, None, None, None, None),
('#Nulls', 'BIGINT', None, None, None, None, None),
('Max Size', 'INT', None, None, None, None, None),
('Avg Size', 'DOUBLE', None, None, None, None, None)]
For those unfamiliar with impyla:
fetchall() return the query results; each tuple is a row.
description returns the column labels and types, e.g. the first column is named Column and is of type string.
Attachments
Issue Links
- duplicates
-
IMPALA-4962 Max Size column incorrectly has NULLs in column stats via HS2 interface
- Resolved