Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Incomplete
-
Impala 1.2.4
-
None
-
Amazon EMR Cluster:
AMI version:3.2.1
Hadoop distribution:Amazon 2.4.0
Applications:Hive latest, Impala latest
Master:Running1m3.xlarge
Core:Running1m3.2xlarge
Description
I create this kind of table:
------------------------------------------
CREATE EXTERNAL TABLE IF NOT EXISTS o2 (
log_date BIGINT,
service_key STRING,
user_id STRING,
user_country STRING
)
PARTITIONED BY (year SMALLINT, month TINYINT)
STORED AS PARQUET;
------------------------------------------
Afterwards I import data from HIVE:
------------------------------------------
SET hive.exec.dynamic.partition=true;
SET hive.exec.dynamic.partition.mode=nonstrict;
add jar s3://.../parquet-hive-bundle-1.6.0rc2.jar;
INSERT OVERWRITE TABLE o2 PARTITION(year, month) SELECT * FROM o2_hive;
------------------------------------------
Then I run queries:
------------------------------------------
[ip-10-169-190-25.ec2.internal:21000] > select COUNT(DISTINCT(t.user_id)) FROM o2 t WHERE (t.service_key = "my_key") AND month=3 AND year=2014;
Query: select COUNT(DISTINCT(t.user_id)) FROM o2 t WHERE (t.service_key = "my_key") AND month=3 AND year=2014
-----------------------------
count(distinct (t.user_id)) |
-----------------------------
3 |
-----------------------------
Returned 1 row(s) in 0.97s
[ip-10-169-190-25.ec2.internal:21000] > select COUNT(DISTINCT(t.user_id)) FROM o2 t WHERE (t.service_key = "my_key") AND month=3 AND year=2014;
Query: select COUNT(DISTINCT(t.user_id)) FROM o2 t WHERE (t.service_key = "my_key") AND month=3 AND year=2014
-----------------------------
count(distinct (t.user_id)) |
-----------------------------
4 |
-----------------------------
------------------------------------------
As you can see I run the very same query but I get different results. Do you have any idea what might be wrong?
Thank you!