[HIVE-9534] incorrect result set for query that projects a windowed aggregate - ASF JIRA

Harsh J added a comment - 25/Apr/15 05:46

Applying a similar SQL in PostgreSQL or Impala returns an error of the form "DISTINCT is not implemented for window functions". Hive, unless it did add proper support for distinct in such a context, should likely output the same error (if not a bug-fix).

Harsh J added a comment - 25/Apr/15 05:46 Applying a similar SQL in PostgreSQL or Impala returns an error of the form "DISTINCT is not implemented for window functions". Hive, unless it did add proper support for distinct in such a context, should likely output the same error (if not a bug-fix).

N Campbell added a comment - 26/Apr/15 11:25

While Postgres may not support ISO-SQL syntax as shown other vendors do. That said, if the engine does not attempt to support distinct aggregation then a parsing error should be thrown ~~and~~ the documentation improved at

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics

N Campbell added a comment - 26/Apr/15 11:25 While Postgres may not support ISO-SQL syntax as shown other vendors do. That said, if the engine does not attempt to support distinct aggregation then a parsing error should be thrown and the documentation improved at https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics

Chaoyu Tang added a comment - 04/May/15 02:39

I tried the tests (distinct with window function) in MySQL, PostgreSQL and Oracle with following steps:

create table testwindow (col1 int, col2 int);
insert into testwindow values (1, 1);
insert into testwindow values (1, 2);
insert into testwindow values (1, 3);
insert into testwindow values (2, 1);
insert into testwindow values (2, 2);
insert into testwindow values (3, 3);
---
select avg(distinct col1) over() from testwindow;

MySQL: did not work and got the error msg: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '() from testwindow' at line 1

PostgreSQL: did not work and got the error msg: ERROR: DISTINCT is not implemented for window functions Position: 8

Oracle: seemed work but I wonder if it is right, the average is right ( average of 1, 2, 3) with 6 rows:

Hive only returns one row but with correct average: 2

Chaoyu Tang added a comment - 04/May/15 02:39 I tried the tests (distinct with window function) in MySQL, PostgreSQL and Oracle with following steps: create table testwindow (col1 int , col2 int ); insert into testwindow values (1, 1); insert into testwindow values (1, 2); insert into testwindow values (1, 3); insert into testwindow values (2, 1); insert into testwindow values (2, 2); insert into testwindow values (3, 3); --- select avg(distinct col1) over() from testwindow; MySQL: did not work and got the error msg: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '() from testwindow' at line 1 PostgreSQL: did not work and got the error msg: ERROR: DISTINCT is not implemented for window functions Position: 8 Oracle: seemed work but I wonder if it is right, the average is right ( average of 1, 2, 3) with 6 rows: 1 2 2 2 3 2 4 2 5 2 6 2 Hive only returns one row but with correct average: 2

Chaoyu Tang added a comment - 04/May/15 02:42

rhbutani I have not looked deeply into code yet, do you think the feature like distinct with window function has not been supported in Hive at this moment, or may it be a bug? Is it related to HIVE-10586?

Chaoyu Tang added a comment - 04/May/15 02:42 rhbutani I have not looked deeply into code yet, do you think the feature like distinct with window function has not been supported in Hive at this moment, or may it be a bug? Is it related to HIVE-10586 ?

Chaoyu Tang added a comment - 05/May/15 17:51

Actually as of Oracle 11.2 (the version I tested was Oracle Database 11g Express Edition Release 11.2.0.2.0 - 64bit Production), the window function is not supported. Though the query "select avg(distinct col1) over() from testwindow;", where avg acts on all rows, works, query with windowing_clause does actually not work. for example:

Query: 
select avg(distinct col1) over(order by col2 rows between 1 preceding and 1 following) from testwindow;
---
Error:
ORA-30487: ORDER BY not allowed here
30487. 00000 -  "ORDER BY not allowed here"
*Cause:    DISTINCT functions and RATIO_TO_REPORT cannot have an ORDER BY

Based on Oracle document http://docs.oracle.com/cd/E11882_01/server.112/e25554/analysis.htm

Note that the DISTINCT keyword is not supported in windowing functions except for MAX and MIN.

Based on Hive plan for "select avg(distinct col1) over() from testwindow;" it looks like Hive was computing the distinct value of col1:

STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: testwindow
            Statistics: Num rows: 6 Data size: 18 Basic stats: COMPLETE Column stats: NONE
            Select Operator
              expressions: col1 (type: int)
              outputColumnNames: col1
              Statistics: Num rows: 6 Data size: 18 Basic stats: COMPLETE Column stats: NONE
              Group By Operator
                aggregations: avg(DISTINCT col1)
                keys: col1 (type: int)
                mode: hash
                outputColumnNames: _col0, _col1
                Statistics: Num rows: 6 Data size: 18 Basic stats: COMPLETE Column stats: NONE
                Reduce Output Operator
                  key expressions: _col0 (type: int)
                  sort order: +
                  Statistics: Num rows: 6 Data size: 18 Basic stats: COMPLETE Column stats: NONE
      Reduce Operator Tree:
        Group By Operator
          aggregations: avg(DISTINCT KEY._col0:0._col0)
          mode: mergepartial
          outputColumnNames: _col0
          Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
          File Output Operator
            compressed: false
            Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
            table:
                input format: org.apache.hadoop.mapred.TextInputFormat
                output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

Chaoyu Tang added a comment - 05/May/15 17:51 Actually as of Oracle 11.2 (the version I tested was Oracle Database 11g Express Edition Release 11.2.0.2.0 - 64bit Production), the window function is not supported. Though the query "select avg(distinct col1) over() from testwindow;", where avg acts on all rows, works, query with windowing_clause does actually not work. for example: Query: select avg(distinct col1) over(order by col2 rows between 1 preceding and 1 following) from testwindow; --- Error: ORA-30487: ORDER BY not allowed here 30487. 00000 - "ORDER BY not allowed here" *Cause: DISTINCT functions and RATIO_TO_REPORT cannot have an ORDER BY Based on Oracle document http://docs.oracle.com/cd/E11882_01/server.112/e25554/analysis.htm Note that the DISTINCT keyword is not supported in windowing functions except for MAX and MIN. Based on Hive plan for "select avg(distinct col1) over() from testwindow;" it looks like Hive was computing the distinct value of col1: STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan alias: testwindow Statistics: Num rows: 6 Data size: 18 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: col1 (type: int ) outputColumnNames: col1 Statistics: Num rows: 6 Data size: 18 Basic stats: COMPLETE Column stats: NONE Group By Operator aggregations: avg(DISTINCT col1) keys: col1 (type: int ) mode: hash outputColumnNames: _col0, _col1 Statistics: Num rows: 6 Data size: 18 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col0 (type: int ) sort order: + Statistics: Num rows: 6 Data size: 18 Basic stats: COMPLETE Column stats: NONE Reduce Operator Tree: Group By Operator aggregations: avg(DISTINCT KEY._col0:0._col0) mode: mergepartial outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: ListSink

Chaoyu Tang added a comment - 05/May/15 21:20

Looks like that the over(analytic_clause) part will be ignored in a query with distinct in Hive:

function
@init { gParent.pushMsg("function specification", state); }
@after { gParent.popMsg(state); }
    :
    functionName
    LPAREN
      (
        (STAR) => (star=STAR)
        | (dist=KW_DISTINCT)? (selectExpression (COMMA selectExpression)*)?
      )
    RPAREN (KW_OVER ws=window_specification)?
           -> {$star != null}? ^(TOK_FUNCTIONSTAR functionName $ws?)
           -> {$dist == null}? ^(TOK_FUNCTION functionName (selectExpression+)? $ws?)
                            -> ^(TOK_FUNCTIONDI functionName (selectExpression+)?)
    ;

the query like:
select avg(distinct col1) over() from testwindow; or
select avg(distinct col1) over(order by col2 rows between 1 preceding and 1 following) from testwindow;
the over(...) is totally ignored.
So I am going to fix this issue by throwing out unsupported error.

Chaoyu Tang added a comment - 05/May/15 21:20 Looks like that the over(analytic_clause) part will be ignored in a query with distinct in Hive: function @init { gParent.pushMsg( "function specification" , state); } @after { gParent.popMsg(state); } : functionName LPAREN ( (STAR) => (star=STAR) | (dist=KW_DISTINCT)? (selectExpression (COMMA selectExpression)*)? ) RPAREN (KW_OVER ws=window_specification)? -> {$star != null }? ^(TOK_FUNCTIONSTAR functionName $ws?) -> {$dist == null }? ^(TOK_FUNCTION functionName (selectExpression+)? $ws?) -> ^(TOK_FUNCTIONDI functionName (selectExpression+)?) ; the query like: select avg(distinct col1) over() from testwindow; or select avg(distinct col1) over(order by col2 rows between 1 preceding and 1 following) from testwindow; the over(...) is totally ignored. So I am going to fix this issue by throwing out unsupported error.

N Campbell added a comment - 06/May/15 00:09

re your comment about ORACLE

select avg(distinct tsint.csint) over () from tsint

null, -1, 0, 1, 10
ORACLE Oracle Database 12c Enterprise Edition ( 12.1.0.2.0) returns 2.5,
2.5, 2.5, 2.5, 2.5

N Campbell added a comment - 06/May/15 00:09 re your comment about ORACLE select avg(distinct tsint.csint) over () from tsint null, -1, 0, 1, 10 ORACLE Oracle Database 12c Enterprise Edition ( 12.1.0.2.0) returns 2.5, 2.5, 2.5, 2.5, 2.5

Chaoyu Tang added a comment - 06/May/15 01:22

Oracle 11.2 treats avg(distinct tsint.csint) over () as analytic function instead of aggregation function, so the query return 4 rows of returns 2.5. Note, there is not order by clause or window clause inside the parenthesis of "over". Could you try query like "select avg(distinct tsint.csint) over (order by rnum rows between 1 preceding and 1 following) from tsint" to see if it works in Oracle c12? It did not work in 11.2.

Chaoyu Tang added a comment - 06/May/15 01:22 Oracle 11.2 treats avg(distinct tsint.csint) over () as analytic function instead of aggregation function, so the query return 4 rows of returns 2.5. Note, there is not order by clause or window clause inside the parenthesis of "over". Could you try query like "select avg(distinct tsint.csint) over (order by rnum rows between 1 preceding and 1 following) from tsint" to see if it works in Oracle c12? It did not work in 11.2.

N Campbell added a comment - 22/Sep/15 02:34

IBM Netezza 7.2, IBM Big Insights, SAP Hana SP10, ORACLE 12 etc all return the expected result.

N Campbell added a comment - 22/Sep/15 02:34 IBM Netezza 7.2, IBM Big Insights, SAP Hana SP10, ORACLE 12 etc all return the expected result.

N Campbell added a comment - 22/Sep/15 02:40

should say Netezza 7.2 and ORACLE 12.

N Campbell added a comment - 22/Sep/15 02:40 should say Netezza 7.2 and ORACLE 12.

Aihua Xu added a comment - 15/Jan/16 14:12

Talked to Chaoyu. I will take a look at the issue.

Aihua Xu added a comment - 15/Jan/16 14:12 Talked to Chaoyu. I will take a look at the issue.

Aihua Xu added a comment - 04/Feb/16 16:24

Attach Patch-1: this is to support AVG(DISTINCT) OVER (PARTITION BY) format. Similar to SUM() and COUNT(), it's to skip the same value when DISTINCT is present.

Aihua Xu added a comment - 04/Feb/16 16:24 Attach Patch-1: this is to support AVG(DISTINCT) OVER (PARTITION BY) format. Similar to SUM() and COUNT(), it's to skip the same value when DISTINCT is present.

Aihua Xu added a comment - 04/Feb/16 16:25

Code review: https://reviews.apache.org/r/43192/. Somehow I can't link to that.

Aihua Xu added a comment - 04/Feb/16 16:25 Code review: https://reviews.apache.org/r/43192/ . Somehow I can't link to that.

Aihua Xu added a comment - 04/Feb/16 16:33

Patch-2: add the missing license comment in the new file.

Aihua Xu added a comment - 04/Feb/16 16:33 Patch-2: add the missing license comment in the new file.

Hive QA added a comment - 06/Feb/16 09:52

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12786290/HIVE-9534.2.patch

SUCCESS: +1 due to 1 test(s) being added or modified.

ERROR: -1 due to 18 failed/errored test(s), 10052 tests executed
Failed tests:

org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_windowing_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ctas_colname
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lineage3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ptfgroupbyjoin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_quotedid_basic
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_in
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_notin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_unqualcolumnrefs
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_windowing_streaming
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_subquery_in
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_windowing_gby
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_invalid_avg_syntax
org.apache.hadoop.hive.cli.TestPerfCliDriver.testPerfCliDriver_query67
org.apache.hadoop.hive.cli.TestPerfCliDriver.testPerfCliDriver_query70
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_subquery_in
org.apache.hive.jdbc.TestSSL.testSSLVersion

Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6883/testReport
Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6883/console
Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6883/

Messages:

Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 18 tests failed

This message is automatically generated.

ATTACHMENT ID: 12786290 - PreCommit-HIVE-TRUNK-Build

Hive QA added a comment - 06/Feb/16 09:52 Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12786290/HIVE-9534.2.patch SUCCESS: +1 due to 1 test(s) being added or modified. ERROR: -1 due to 18 failed/errored test(s), 10052 tests executed Failed tests: org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_windowing_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ctas_colname org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lineage3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ptfgroupbyjoin org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_quotedid_basic org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_in org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_notin org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_unqualcolumnrefs org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_windowing_streaming org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_subquery_in org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_windowing_gby org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_invalid_avg_syntax org.apache.hadoop.hive.cli.TestPerfCliDriver.testPerfCliDriver_query67 org.apache.hadoop.hive.cli.TestPerfCliDriver.testPerfCliDriver_query70 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_subquery_in org.apache.hive.jdbc.TestSSL.testSSLVersion Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6883/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6883/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6883/ Messages: Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 18 tests failed This message is automatically generated. ATTACHMENT ID: 12786290 - PreCommit-HIVE-TRUNK-Build

Aihua Xu added a comment - 08/Feb/16 17:33

Attached patch-3: other UDAF like RANK() got affected.

Aihua Xu added a comment - 08/Feb/16 17:33 Attached patch-3: other UDAF like RANK() got affected.

Hive QA added a comment - 09/Feb/16 18:49

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12786846/HIVE-9534.3.patch

SUCCESS: +1 due to 1 test(s) being added or modified.

ERROR: -1 due to 5 failed/errored test(s), 10039 tests executed
Failed tests:

TestSparkCliDriver-timestamp_lazy.q-bucketsortoptimize_insert_4.q-date_udf.q-and-12-more - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_invalid_avg_syntax
org.apache.hadoop.hive.ql.TestTxnCommands2.testInitiatorWithMultipleFailedCompactions
org.apache.hive.jdbc.TestSSL.testSSLVersion

Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6920/testReport
Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6920/console
Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6920/

Messages:

Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed

This message is automatically generated.

ATTACHMENT ID: 12786846 - PreCommit-HIVE-TRUNK-Build

Hive QA added a comment - 09/Feb/16 18:49 Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12786846/HIVE-9534.3.patch SUCCESS: +1 due to 1 test(s) being added or modified. ERROR: -1 due to 5 failed/errored test(s), 10039 tests executed Failed tests: TestSparkCliDriver-timestamp_lazy.q-bucketsortoptimize_insert_4.q-date_udf.q-and-12-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_invalid_avg_syntax org.apache.hadoop.hive.ql.TestTxnCommands2.testInitiatorWithMultipleFailedCompactions org.apache.hive.jdbc.TestSSL.testSSLVersion Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6920/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6920/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6920/ Messages: Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed This message is automatically generated. ATTACHMENT ID: 12786846 - PreCommit-HIVE-TRUNK-Build

Aihua Xu added a comment - 09/Feb/16 20:00

Attached patch-4: fix the unit test invalid_avg_syntax issue.

Aihua Xu added a comment - 09/Feb/16 20:00 Attached patch-4: fix the unit test invalid_avg_syntax issue.

Hive QA added a comment - 11/Feb/16 07:01

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12787127/HIVE-9534.4.patch

SUCCESS: +1 due to 1 test(s) being added or modified.

ERROR: -1 due to 4 failed/errored test(s), 9738 tests executed
Failed tests:

TestMiniTezCliDriver-auto_sortmerge_join_13.q-tez_self_join.q-schema_evol_text_nonvec_mapwork_table.q-and-12-more - did not produce a TEST-*.xml file
TestMiniTezCliDriver-dynpart_sort_optimization2.q-tez_bmj_schema_evolution.q-vector_char_mapjoin1.q-and-12-more - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hive.jdbc.TestSSL.testSSLVersion

Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6939/testReport
Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6939/console
Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6939/

Messages:

Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed

This message is automatically generated.

ATTACHMENT ID: 12787127 - PreCommit-HIVE-TRUNK-Build

Hive QA added a comment - 11/Feb/16 07:01 Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12787127/HIVE-9534.4.patch SUCCESS: +1 due to 1 test(s) being added or modified. ERROR: -1 due to 4 failed/errored test(s), 9738 tests executed Failed tests: TestMiniTezCliDriver-auto_sortmerge_join_13.q-tez_self_join.q-schema_evol_text_nonvec_mapwork_table.q-and-12-more - did not produce a TEST-*.xml file TestMiniTezCliDriver-dynpart_sort_optimization2.q-tez_bmj_schema_evolution.q-vector_char_mapjoin1.q-and-12-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import org.apache.hive.jdbc.TestSSL.testSSLVersion Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6939/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6939/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6939/ Messages: Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed This message is automatically generated. ATTACHMENT ID: 12787127 - PreCommit-HIVE-TRUNK-Build

Yongzhi Chen added a comment - 12/Feb/16 00:06

It seems that your method need a global sorting first, then group the values to have a right answer. Not As what I thought: each node sort and group its own values, then merge (which need combine arrays). Does your method do global sorting first?

Yongzhi Chen added a comment - 12/Feb/16 00:06 It seems that your method need a global sorting first, then group the values to have a right answer. Not As what I thought: each node sort and group its own values, then merge (which need combine arrays). Does your method do global sorting first?

Yongzhi Chen added a comment - 12/Feb/16 02:50

My fault, re-attach patch2 with the test fix.

Yongzhi Chen added a comment - 12/Feb/16 02:50 My fault, re-attach patch2 with the test fix.

Yongzhi Chen added a comment - 12/Feb/16 03:13

As the Avg(distinct) is calculated after global sort, the fix is a good one.
+1

Yongzhi Chen added a comment - 12/Feb/16 03:13 As the Avg(distinct) is calculated after global sort, the fix is a good one. +1

Aihua Xu added a comment - 12/Feb/16 15:58

Thanks Yongzhi. Yeah. The current implementation will do a sort on the partitioning columns so each partition data in the reducer is sorted and we only need to check previous row against the current row to find if it's distinct or not.

Aihua Xu added a comment - 12/Feb/16 15:58 Thanks Yongzhi. Yeah. The current implementation will do a sort on the partitioning columns so each partition data in the reducer is sorted and we only need to check previous row against the current row to find if it's distinct or not.

Aihua Xu added a comment - 12/Feb/16 20:24

Pushed to master. Thanks Yongzhi for reviewing.

Aihua Xu added a comment - 12/Feb/16 20:24 Pushed to master. Thanks Yongzhi for reviewing.

Lefty Leverenz added a comment - 13/Feb/16 08:09

Does the documentation still need to be improved?

Lefty Leverenz added a comment - 13/Feb/16 08:09 Does the documentation still need to be improved?

Sergey Shelukhin added a comment - 05/Apr/16 18:00

Does this need to be backported to 2.0.1? Seems like a serious bug

Sergey Shelukhin added a comment - 05/Apr/16 18:00 Does this need to be backported to 2.0.1? Seems like a serious bug

Aihua Xu added a comment - 05/Apr/16 19:49

That could be considered as new feature rather than bug, I guess. Since technically we don't support distinct in windowing function before. Probably we should not backport to 2.0.1.

Aihua Xu added a comment - 05/Apr/16 19:49 That could be considered as new feature rather than bug, I guess. Since technically we don't support distinct in windowing function before. Probably we should not backport to 2.0.1.

Sergey Shelukhin added a comment - 05/Apr/16 20:03

Makes sense, thanks.

Sergey Shelukhin added a comment - 05/Apr/16 20:03 Makes sense, thanks.

Lefty Leverenz added a comment - 06/Apr/16 00:20

Doc query: the6campbells said in the second comment on this issue:

While Postgres may not support ISO-SQL syntax as shown other vendors do. That said, if the engine does not attempt to support distinct aggregation then a parsing error should be thrown and the documentation improved at
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics

So aihuaxu, what documentation changes are needed now that this issue is resolved?

Lefty Leverenz added a comment - 06/Apr/16 00:20 Doc query: the6campbells said in the second comment on this issue: While Postgres may not support ISO-SQL syntax as shown other vendors do. That said, if the engine does not attempt to support distinct aggregation then a parsing error should be thrown and the documentation improved at https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics So aihuaxu , what documentation changes are needed now that this issue is resolved?

Aihua Xu added a comment - 07/Apr/16 14:00

leftylev I just updated the doc.

I also filed ~~HIVE-13453~~ to track the improvement to address the current limitation.

Aihua Xu added a comment - 07/Apr/16 14:00 leftylev I just updated the doc. I also filed HIVE-13453 to track the improvement to address the current limitation.

Lefty Leverenz added a comment - 08/Apr/16 08:14

Thanks aihuaxu!

Lefty Leverenz added a comment - 08/Apr/16 08:14 Thanks aihuaxu !

Hive

incorrect result set for query that projects a windowed aggregate

Details

Description

Attachments

Attachments

Issue Links

Sub-Tasks

Activity

People

Dates