Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
Impala 0.5, Impala 1.4, Impala 2.0, Impala 2.2, Impala 2.3.0
Description
Impala only allows a single (DISTINCT columns) expression in each query.
Note:
If you do not need precise accuracy, you can produce an estimate of the distinct values for a column by specifying NDV(column); a query can contain multiple instances of NDV(column). To make Impala automatically rewrite COUNT(DISTINCT) expressions to NDV(), enable the APPX_COUNT_DISTINCT query option.
[impala:21000] > select count(distinct i_class_id) from item; Query: select count(distinct i_class_id) from item Query finished, fetching results ... 16 Returned 1 row(s) in 1.51s
[impala:21000] > select count(distinct i_class_id), count(distinct i_brand_id) from item; Query: select count(distinct i_class_id), count(distinct i_brand_id) from item ERROR: com.cloudera.impala.common.AnalysisException: Analysis exception (in select count(distinct i_class_id), count(distinct i_brand_id) from item) at com.cloudera.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:133) at com.cloudera.impala.service.Frontend.createExecRequest(Frontend.java:221) at com.cloudera.impala.service.JniFrontend.createExecRequest(JniFrontend.java:89) Caused by: com.cloudera.impala.common.AnalysisException: all DISTINCT aggregate functions need to have the same set of parameters as COUNT(DISTINCT i_class_id); deviating function: COUNT(DISTINCT i_brand_id) at com.cloudera.impala.analysis.AggregateInfo.createDistinctAggInfo(AggregateInfo.java:196) at com.cloudera.impala.analysis.AggregateInfo.create(AggregateInfo.java:143) at com.cloudera.impala.analysis.SelectStmt.createAggInfo(SelectStmt.java:466) at com.cloudera.impala.analysis.SelectStmt.analyzeAggregation(SelectStmt.java:347) at com.cloudera.impala.analysis.SelectStmt.analyze(SelectStmt.java:155) at com.cloudera.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:130) ... 2 more
Hive supports this:
$ hive -e "select count(distinct i_class_id), count(distinct i_brand_id) from item;" Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties Hive history file=/tmp/grahn/hive_job_log_grahn_201303052234_1625576708.txt Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapred.reduce.tasks=<number> Starting Job = job_201302081514_0073, Tracking URL = http://impala:50030/jobdetails.jsp?jobid=job_201302081514_0073 Kill Command = /usr/lib/hadoop/bin/hadoop job -Dmapred.job.tracker=m0525.mtv.cloudera.com:8021 -kill job_201302081514_0073 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2013-03-05 22:34:43,255 Stage-1 map = 0%, reduce = 0% 2013-03-05 22:34:49,323 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.81 sec 2013-03-05 22:34:50,337 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.81 sec 2013-03-05 22:34:51,351 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.81 sec 2013-03-05 22:34:52,360 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.81 sec 2013-03-05 22:34:53,370 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.81 sec 2013-03-05 22:34:54,379 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.81 sec 2013-03-05 22:34:55,389 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 8.58 sec 2013-03-05 22:34:56,402 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 8.58 sec 2013-03-05 22:34:57,413 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 8.58 sec 2013-03-05 22:34:58,424 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 8.58 sec MapReduce Total cumulative CPU time: 8 seconds 580 msec Ended Job = job_201302081514_0073 MapReduce Jobs Launched: Job 0: Map: 1 Reduce: 1 Cumulative CPU: 8.58 sec HDFS Read: 0 HDFS Write: 0 SUCCESS Total MapReduce CPU Time Spent: 8 seconds 580 msec OK 16 952 Time taken: 25.666 seconds
Attachments
Issue Links
- breaks
-
IMPALA-8790 IllegalStateException: Illegal reference to non-materialized slot
- Resolved
- causes
-
IMPALA-7240 TestSpillingDebugActionDimensions.test_spilling_regression_exhaustive hits memory limit on exhaustive tests
- Resolved
-
IMPALA-7251 Fix QueryMaintenance calls in Aggregators
- Resolved
-
IMPALA-7397 dcheck in impala::AggregationNode::Close
- Resolved
- is depended upon by
-
IMPALA-7204 Add support for GROUP BY ROLLUP, CUBE and GROUPING SETS
- Open
- is related to
-
IMPALA-10445 The ability to adjust NDV's precision with query option
- Resolved