Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.4.0, 1.5.0, 1.6.0
-
None
-
CentOS release 6.5 (2.6.32-431.11.9.el6.ucloud.x86_64)
KUDU-1.4.0-1.cdh5.12.1.p0.10
IMPALA 2.6.0
x86-64
Intel CPU
Description
I ran the following SQL again and again
while refresh 8050/scans page at the same time.
sql:
select count(xx_id),count(yy_id),count(time) from test_table where event_id =29983;
"Cells read from disk" is much more greater then table size when materializing_iterator_do_pushdown = true (default).
after setting materializing_iterator_do_pushdown = false
"Cells read from disk" reduced to some reasonable value (close to table size)
and the sql run faster.
here's detail:
table under test:
CREATE TABLE rawdata.test_table ( day INT NOT NULL ENCODING BIT_SHUFFLE COMPRESSION DEFAULT_COMPRESSION, user_id BIGINT NOT NULL ENCODING BIT_SHUFFLE COMPRESSION DEFAULT_COMPRESSION, time TIMESTAMP NOT NULL ENCODING BIT_SHUFFLE COMPRESSION DEFAULT_COMPRESSION, event_id INT NULL ENCODING BIT_SHUFFLE COMPRESSION DEFAULT_COMPRESSION, distinct_id STRING NULL ENCODING DICT_ENCODING COMPRESSION DEFAULT_COMPRESSION, ... ... other fields ... ... PRIMARY KEY (day, user_id, time, _offset) ) PARTITION BY HASH (user_id) PARTITIONS 9 STORED AS KUDU TBLPROPERTIES ( ... );
table size (select count(1) from test_table) : 19510709
CASE 1, materializing_iterator_do_pushdown = true
756ACA6F105F0905EBCB79B940FFCE86.jpg
CASE 2, materializing_iterator_do_pushdown = false (sql ran faster)
F8C604537B8E921DDCCA78995DC11BDA.jpg
it looks like kudu scan table multiple times for the simple sql caused by some silly bug.