Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-9534

incorrect result set for query that projects a windowed aggregate

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 2.1.0
    • PTF-Windowing
    • None

    Description

      Result set returned by Hive has one row instead of 5

      select avg(distinct tsint.csint) over () from tsint 
      
      create table  if not exists TSINT (RNUM int , CSINT smallint)
       ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' 
       STORED AS TEXTFILE;
      
      0|\N
      1|-1
      2|0
      3|1
      4|10
      

      Attachments

        1. HIVE-9534.4.patch
          15 kB
          Aihua Xu
        2. HIVE-9534.3.patch
          14 kB
          Aihua Xu
        3. HIVE-9534.2.patch
          14 kB
          Aihua Xu
        4. HIVE-9534.1.patch
          13 kB
          Aihua Xu

        Issue Links

          Activity

            qwertymaniac Harsh J added a comment -

            Applying a similar SQL in PostgreSQL or Impala returns an error of the form "DISTINCT is not implemented for window functions". Hive, unless it did add proper support for distinct in such a context, should likely output the same error (if not a bug-fix).

            qwertymaniac Harsh J added a comment - Applying a similar SQL in PostgreSQL or Impala returns an error of the form "DISTINCT is not implemented for window functions". Hive, unless it did add proper support for distinct in such a context, should likely output the same error (if not a bug-fix).
            the6campbells N Campbell added a comment -

            While Postgres may not support ISO-SQL syntax as shown other vendors do. That said, if the engine does not attempt to support distinct aggregation then a parsing error should be thrown and the documentation improved at

            https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics

            the6campbells N Campbell added a comment - While Postgres may not support ISO-SQL syntax as shown other vendors do. That said, if the engine does not attempt to support distinct aggregation then a parsing error should be thrown and the documentation improved at https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics
            ctang Chaoyu Tang added a comment -

            I tried the tests (distinct with window function) in MySQL, PostgreSQL and Oracle with following steps:

            create table testwindow (col1 int, col2 int);
            insert into testwindow values (1, 1);
            insert into testwindow values (1, 2);
            insert into testwindow values (1, 3);
            insert into testwindow values (2, 1);
            insert into testwindow values (2, 2);
            insert into testwindow values (3, 3);
            ---
            select avg(distinct col1) over() from testwindow;
            

            MySQL: did not work and got the error msg: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '() from testwindow' at line 1

            PostgreSQL: did not work and got the error msg: ERROR: DISTINCT is not implemented for window functions Position: 8

            Oracle: seemed work but I wonder if it is right, the average is right ( average of 1, 2, 3) with 6 rows:

            1  2
            2  2
            3  2
            4  2
            5  2
            6  2
            

            Hive only returns one row but with correct average: 2

            ctang Chaoyu Tang added a comment - I tried the tests (distinct with window function) in MySQL, PostgreSQL and Oracle with following steps: create table testwindow (col1 int , col2 int ); insert into testwindow values (1, 1); insert into testwindow values (1, 2); insert into testwindow values (1, 3); insert into testwindow values (2, 1); insert into testwindow values (2, 2); insert into testwindow values (3, 3); --- select avg(distinct col1) over() from testwindow; MySQL: did not work and got the error msg: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '() from testwindow' at line 1 PostgreSQL: did not work and got the error msg: ERROR: DISTINCT is not implemented for window functions Position: 8 Oracle: seemed work but I wonder if it is right, the average is right ( average of 1, 2, 3) with 6 rows: 1 2 2 2 3 2 4 2 5 2 6 2 Hive only returns one row but with correct average: 2
            ctang Chaoyu Tang added a comment -

            rhbutani I have not looked deeply into code yet, do you think the feature like distinct with window function has not been supported in Hive at this moment, or may it be a bug? Is it related to HIVE-10586?

            ctang Chaoyu Tang added a comment - rhbutani I have not looked deeply into code yet, do you think the feature like distinct with window function has not been supported in Hive at this moment, or may it be a bug? Is it related to HIVE-10586 ?
            ctang Chaoyu Tang added a comment -

            Actually as of Oracle 11.2 (the version I tested was Oracle Database 11g Express Edition Release 11.2.0.2.0 - 64bit Production), the window function is not supported. Though the query "select avg(distinct col1) over() from testwindow;", where avg acts on all rows, works, query with windowing_clause does actually not work. for example:

            Query: 
            select avg(distinct col1) over(order by col2 rows between 1 preceding and 1 following) from testwindow;
            ---
            Error:
            ORA-30487: ORDER BY not allowed here
            30487. 00000 -  "ORDER BY not allowed here"
            *Cause:    DISTINCT functions and RATIO_TO_REPORT cannot have an ORDER BY
            

            Based on Oracle document http://docs.oracle.com/cd/E11882_01/server.112/e25554/analysis.htm

            Note that the DISTINCT keyword is not supported in windowing functions except for MAX and MIN.
            

            Based on Hive plan for "select avg(distinct col1) over() from testwindow;" it looks like Hive was computing the distinct value of col1:

            STAGE DEPENDENCIES:
              Stage-1 is a root stage
              Stage-0 depends on stages: Stage-1
            
            STAGE PLANS:
              Stage: Stage-1
                Map Reduce
                  Map Operator Tree:
                      TableScan
                        alias: testwindow
                        Statistics: Num rows: 6 Data size: 18 Basic stats: COMPLETE Column stats: NONE
                        Select Operator
                          expressions: col1 (type: int)
                          outputColumnNames: col1
                          Statistics: Num rows: 6 Data size: 18 Basic stats: COMPLETE Column stats: NONE
                          Group By Operator
                            aggregations: avg(DISTINCT col1)
                            keys: col1 (type: int)
                            mode: hash
                            outputColumnNames: _col0, _col1
                            Statistics: Num rows: 6 Data size: 18 Basic stats: COMPLETE Column stats: NONE
                            Reduce Output Operator
                              key expressions: _col0 (type: int)
                              sort order: +
                              Statistics: Num rows: 6 Data size: 18 Basic stats: COMPLETE Column stats: NONE
                  Reduce Operator Tree:
                    Group By Operator
                      aggregations: avg(DISTINCT KEY._col0:0._col0)
                      mode: mergepartial
                      outputColumnNames: _col0
                      Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
                      File Output Operator
                        compressed: false
                        Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
                        table:
                            input format: org.apache.hadoop.mapred.TextInputFormat
                            output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                            serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
            
              Stage: Stage-0
                Fetch Operator
                  limit: -1
                  Processor Tree:
                    ListSink
            
            ctang Chaoyu Tang added a comment - Actually as of Oracle 11.2 (the version I tested was Oracle Database 11g Express Edition Release 11.2.0.2.0 - 64bit Production), the window function is not supported. Though the query "select avg(distinct col1) over() from testwindow;", where avg acts on all rows, works, query with windowing_clause does actually not work. for example: Query: select avg(distinct col1) over(order by col2 rows between 1 preceding and 1 following) from testwindow; --- Error: ORA-30487: ORDER BY not allowed here 30487. 00000 - "ORDER BY not allowed here" *Cause: DISTINCT functions and RATIO_TO_REPORT cannot have an ORDER BY Based on Oracle document http://docs.oracle.com/cd/E11882_01/server.112/e25554/analysis.htm Note that the DISTINCT keyword is not supported in windowing functions except for MAX and MIN. Based on Hive plan for "select avg(distinct col1) over() from testwindow;" it looks like Hive was computing the distinct value of col1: STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan alias: testwindow Statistics: Num rows: 6 Data size: 18 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: col1 (type: int ) outputColumnNames: col1 Statistics: Num rows: 6 Data size: 18 Basic stats: COMPLETE Column stats: NONE Group By Operator aggregations: avg(DISTINCT col1) keys: col1 (type: int ) mode: hash outputColumnNames: _col0, _col1 Statistics: Num rows: 6 Data size: 18 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col0 (type: int ) sort order: + Statistics: Num rows: 6 Data size: 18 Basic stats: COMPLETE Column stats: NONE Reduce Operator Tree: Group By Operator aggregations: avg(DISTINCT KEY._col0:0._col0) mode: mergepartial outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: ListSink
            ctang Chaoyu Tang added a comment -

            Looks like that the over(analytic_clause) part will be ignored in a query with distinct in Hive:

            function
            @init { gParent.pushMsg("function specification", state); }
            @after { gParent.popMsg(state); }
                :
                functionName
                LPAREN
                  (
                    (STAR) => (star=STAR)
                    | (dist=KW_DISTINCT)? (selectExpression (COMMA selectExpression)*)?
                  )
                RPAREN (KW_OVER ws=window_specification)?
                       -> {$star != null}? ^(TOK_FUNCTIONSTAR functionName $ws?)
                       -> {$dist == null}? ^(TOK_FUNCTION functionName (selectExpression+)? $ws?)
                                        -> ^(TOK_FUNCTIONDI functionName (selectExpression+)?)
                ;
            

            the query like:
            select avg(distinct col1) over() from testwindow; or
            select avg(distinct col1) over(order by col2 rows between 1 preceding and 1 following) from testwindow;
            the over(...) is totally ignored.
            So I am going to fix this issue by throwing out unsupported error.

            ctang Chaoyu Tang added a comment - Looks like that the over(analytic_clause) part will be ignored in a query with distinct in Hive: function @init { gParent.pushMsg( "function specification" , state); } @after { gParent.popMsg(state); } : functionName LPAREN ( (STAR) => (star=STAR) | (dist=KW_DISTINCT)? (selectExpression (COMMA selectExpression)*)? ) RPAREN (KW_OVER ws=window_specification)? -> {$star != null }? ^(TOK_FUNCTIONSTAR functionName $ws?) -> {$dist == null }? ^(TOK_FUNCTION functionName (selectExpression+)? $ws?) -> ^(TOK_FUNCTIONDI functionName (selectExpression+)?) ; the query like: select avg(distinct col1) over() from testwindow; or select avg(distinct col1) over(order by col2 rows between 1 preceding and 1 following) from testwindow; the over(...) is totally ignored. So I am going to fix this issue by throwing out unsupported error.
            the6campbells N Campbell added a comment -

            re your comment about ORACLE

            select avg(distinct tsint.csint) over () from tsint

            null, -1, 0, 1, 10
            ORACLE Oracle Database 12c Enterprise Edition ( 12.1.0.2.0) returns 2.5,
            2.5, 2.5, 2.5, 2.5

            the6campbells N Campbell added a comment - re your comment about ORACLE select avg(distinct tsint.csint) over () from tsint null, -1, 0, 1, 10 ORACLE Oracle Database 12c Enterprise Edition ( 12.1.0.2.0) returns 2.5, 2.5, 2.5, 2.5, 2.5
            ctang Chaoyu Tang added a comment -

            Oracle 11.2 treats avg(distinct tsint.csint) over () as analytic function instead of aggregation function, so the query return 4 rows of returns 2.5. Note, there is not order by clause or window clause inside the parenthesis of "over". Could you try query like "select avg(distinct tsint.csint) over (order by rnum rows between 1 preceding and 1 following) from tsint" to see if it works in Oracle c12? It did not work in 11.2.

            ctang Chaoyu Tang added a comment - Oracle 11.2 treats avg(distinct tsint.csint) over () as analytic function instead of aggregation function, so the query return 4 rows of returns 2.5. Note, there is not order by clause or window clause inside the parenthesis of "over". Could you try query like "select avg(distinct tsint.csint) over (order by rnum rows between 1 preceding and 1 following) from tsint" to see if it works in Oracle c12? It did not work in 11.2.
            the6campbells N Campbell added a comment -

            IBM Netezza 7.2, IBM Big Insights, SAP Hana SP10, ORACLE 12 etc all return the expected result.

            the6campbells N Campbell added a comment - IBM Netezza 7.2, IBM Big Insights, SAP Hana SP10, ORACLE 12 etc all return the expected result.
            the6campbells N Campbell added a comment -

            should say Netezza 7.2 and ORACLE 12.

            the6campbells N Campbell added a comment - should say Netezza 7.2 and ORACLE 12.
            aihuaxu Aihua Xu added a comment -

            Talked to Chaoyu. I will take a look at the issue.

            aihuaxu Aihua Xu added a comment - Talked to Chaoyu. I will take a look at the issue.
            aihuaxu Aihua Xu added a comment -

            Attach Patch-1: this is to support AVG(DISTINCT) OVER (PARTITION BY) format. Similar to SUM() and COUNT(), it's to skip the same value when DISTINCT is present.

            aihuaxu Aihua Xu added a comment - Attach Patch-1: this is to support AVG(DISTINCT) OVER (PARTITION BY) format. Similar to SUM() and COUNT(), it's to skip the same value when DISTINCT is present.
            aihuaxu Aihua Xu added a comment -


            Code review: https://reviews.apache.org/r/43192/. Somehow I can't link to that.

            aihuaxu Aihua Xu added a comment - Code review: https://reviews.apache.org/r/43192/ . Somehow I can't link to that.
            aihuaxu Aihua Xu added a comment -

            Patch-2: add the missing license comment in the new file.

            aihuaxu Aihua Xu added a comment - Patch-2: add the missing license comment in the new file.
            hiveqa Hive QA added a comment -

            Here are the results of testing the latest attachment:
            https://issues.apache.org/jira/secure/attachment/12786290/HIVE-9534.2.patch

            SUCCESS: +1 due to 1 test(s) being added or modified.

            ERROR: -1 due to 18 failed/errored test(s), 10052 tests executed
            Failed tests:

            org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_windowing_2
            org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ctas_colname
            org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lineage3
            org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ptfgroupbyjoin
            org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_quotedid_basic
            org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_in
            org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_notin
            org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_unqualcolumnrefs
            org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_windowing_streaming
            org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_1
            org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_subquery_in
            org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_windowing_gby
            org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
            org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_invalid_avg_syntax
            org.apache.hadoop.hive.cli.TestPerfCliDriver.testPerfCliDriver_query67
            org.apache.hadoop.hive.cli.TestPerfCliDriver.testPerfCliDriver_query70
            org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_subquery_in
            org.apache.hive.jdbc.TestSSL.testSSLVersion
            

            Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6883/testReport
            Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6883/console
            Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6883/

            Messages:

            Executing org.apache.hive.ptest.execution.TestCheckPhase
            Executing org.apache.hive.ptest.execution.PrepPhase
            Executing org.apache.hive.ptest.execution.ExecutionPhase
            Executing org.apache.hive.ptest.execution.ReportingPhase
            Tests exited with: TestsFailedException: 18 tests failed
            

            This message is automatically generated.

            ATTACHMENT ID: 12786290 - PreCommit-HIVE-TRUNK-Build

            hiveqa Hive QA added a comment - Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12786290/HIVE-9534.2.patch SUCCESS: +1 due to 1 test(s) being added or modified. ERROR: -1 due to 18 failed/errored test(s), 10052 tests executed Failed tests: org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_cbo_rp_windowing_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ctas_colname org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lineage3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ptfgroupbyjoin org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_quotedid_basic org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_in org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_notin org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_unqualcolumnrefs org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_windowing_streaming org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_subquery_in org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_windowing_gby org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_invalid_avg_syntax org.apache.hadoop.hive.cli.TestPerfCliDriver.testPerfCliDriver_query67 org.apache.hadoop.hive.cli.TestPerfCliDriver.testPerfCliDriver_query70 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_subquery_in org.apache.hive.jdbc.TestSSL.testSSLVersion Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6883/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6883/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6883/ Messages: Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 18 tests failed This message is automatically generated. ATTACHMENT ID: 12786290 - PreCommit-HIVE-TRUNK-Build
            aihuaxu Aihua Xu added a comment -

            Attached patch-3: other UDAF like RANK() got affected.

            aihuaxu Aihua Xu added a comment - Attached patch-3: other UDAF like RANK() got affected.
            hiveqa Hive QA added a comment -

            Here are the results of testing the latest attachment:
            https://issues.apache.org/jira/secure/attachment/12786846/HIVE-9534.3.patch

            SUCCESS: +1 due to 1 test(s) being added or modified.

            ERROR: -1 due to 5 failed/errored test(s), 10039 tests executed
            Failed tests:

            TestSparkCliDriver-timestamp_lazy.q-bucketsortoptimize_insert_4.q-date_udf.q-and-12-more - did not produce a TEST-*.xml file
            org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
            org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_invalid_avg_syntax
            org.apache.hadoop.hive.ql.TestTxnCommands2.testInitiatorWithMultipleFailedCompactions
            org.apache.hive.jdbc.TestSSL.testSSLVersion
            

            Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6920/testReport
            Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6920/console
            Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6920/

            Messages:

            Executing org.apache.hive.ptest.execution.TestCheckPhase
            Executing org.apache.hive.ptest.execution.PrepPhase
            Executing org.apache.hive.ptest.execution.ExecutionPhase
            Executing org.apache.hive.ptest.execution.ReportingPhase
            Tests exited with: TestsFailedException: 5 tests failed
            

            This message is automatically generated.

            ATTACHMENT ID: 12786846 - PreCommit-HIVE-TRUNK-Build

            hiveqa Hive QA added a comment - Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12786846/HIVE-9534.3.patch SUCCESS: +1 due to 1 test(s) being added or modified. ERROR: -1 due to 5 failed/errored test(s), 10039 tests executed Failed tests: TestSparkCliDriver-timestamp_lazy.q-bucketsortoptimize_insert_4.q-date_udf.q-and-12-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_invalid_avg_syntax org.apache.hadoop.hive.ql.TestTxnCommands2.testInitiatorWithMultipleFailedCompactions org.apache.hive.jdbc.TestSSL.testSSLVersion Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6920/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6920/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6920/ Messages: Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed This message is automatically generated. ATTACHMENT ID: 12786846 - PreCommit-HIVE-TRUNK-Build
            aihuaxu Aihua Xu added a comment -

            Attached patch-4: fix the unit test invalid_avg_syntax issue.

            aihuaxu Aihua Xu added a comment - Attached patch-4: fix the unit test invalid_avg_syntax issue.
            hiveqa Hive QA added a comment -

            Here are the results of testing the latest attachment:
            https://issues.apache.org/jira/secure/attachment/12787127/HIVE-9534.4.patch

            SUCCESS: +1 due to 1 test(s) being added or modified.

            ERROR: -1 due to 4 failed/errored test(s), 9738 tests executed
            Failed tests:

            TestMiniTezCliDriver-auto_sortmerge_join_13.q-tez_self_join.q-schema_evol_text_nonvec_mapwork_table.q-and-12-more - did not produce a TEST-*.xml file
            TestMiniTezCliDriver-dynpart_sort_optimization2.q-tez_bmj_schema_evolution.q-vector_char_mapjoin1.q-and-12-more - did not produce a TEST-*.xml file
            org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
            org.apache.hive.jdbc.TestSSL.testSSLVersion
            

            Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6939/testReport
            Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6939/console
            Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6939/

            Messages:

            Executing org.apache.hive.ptest.execution.TestCheckPhase
            Executing org.apache.hive.ptest.execution.PrepPhase
            Executing org.apache.hive.ptest.execution.ExecutionPhase
            Executing org.apache.hive.ptest.execution.ReportingPhase
            Tests exited with: TestsFailedException: 4 tests failed
            

            This message is automatically generated.

            ATTACHMENT ID: 12787127 - PreCommit-HIVE-TRUNK-Build

            hiveqa Hive QA added a comment - Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12787127/HIVE-9534.4.patch SUCCESS: +1 due to 1 test(s) being added or modified. ERROR: -1 due to 4 failed/errored test(s), 9738 tests executed Failed tests: TestMiniTezCliDriver-auto_sortmerge_join_13.q-tez_self_join.q-schema_evol_text_nonvec_mapwork_table.q-and-12-more - did not produce a TEST-*.xml file TestMiniTezCliDriver-dynpart_sort_optimization2.q-tez_bmj_schema_evolution.q-vector_char_mapjoin1.q-and-12-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import org.apache.hive.jdbc.TestSSL.testSSLVersion Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6939/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6939/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6939/ Messages: Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed This message is automatically generated. ATTACHMENT ID: 12787127 - PreCommit-HIVE-TRUNK-Build
            ychena Yongzhi Chen added a comment -

            It seems that your method need a global sorting first, then group the values to have a right answer. Not As what I thought: each node sort and group its own values, then merge (which need combine arrays). Does your method do global sorting first?

            ychena Yongzhi Chen added a comment - It seems that your method need a global sorting first, then group the values to have a right answer. Not As what I thought: each node sort and group its own values, then merge (which need combine arrays). Does your method do global sorting first?
            ychena Yongzhi Chen added a comment -

            My fault, re-attach patch2 with the test fix.

            ychena Yongzhi Chen added a comment - My fault, re-attach patch2 with the test fix.
            ychena Yongzhi Chen added a comment -

            As the Avg(distinct) is calculated after global sort, the fix is a good one.
            +1

            ychena Yongzhi Chen added a comment - As the Avg(distinct) is calculated after global sort, the fix is a good one. +1
            aihuaxu Aihua Xu added a comment -

            Thanks Yongzhi. Yeah. The current implementation will do a sort on the partitioning columns so each partition data in the reducer is sorted and we only need to check previous row against the current row to find if it's distinct or not.

            aihuaxu Aihua Xu added a comment - Thanks Yongzhi. Yeah. The current implementation will do a sort on the partitioning columns so each partition data in the reducer is sorted and we only need to check previous row against the current row to find if it's distinct or not.
            aihuaxu Aihua Xu added a comment -

            Pushed to master. Thanks Yongzhi for reviewing.

            aihuaxu Aihua Xu added a comment - Pushed to master. Thanks Yongzhi for reviewing.
            leftyl Lefty Leverenz added a comment -

            Does the documentation still need to be improved?

            leftyl Lefty Leverenz added a comment - Does the documentation still need to be improved?

            Does this need to be backported to 2.0.1? Seems like a serious bug

            sershe Sergey Shelukhin added a comment - Does this need to be backported to 2.0.1? Seems like a serious bug
            aihuaxu Aihua Xu added a comment -

            That could be considered as new feature rather than bug, I guess. Since technically we don't support distinct in windowing function before. Probably we should not backport to 2.0.1.

            aihuaxu Aihua Xu added a comment - That could be considered as new feature rather than bug, I guess. Since technically we don't support distinct in windowing function before. Probably we should not backport to 2.0.1.

            Makes sense, thanks.

            sershe Sergey Shelukhin added a comment - Makes sense, thanks.
            leftyl Lefty Leverenz added a comment -

            Doc query: the6campbells said in the second comment on this issue:

            While Postgres may not support ISO-SQL syntax as shown other vendors do. That said, if the engine does not attempt to support distinct aggregation then a parsing error should be thrown and the documentation improved at
            https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics

            So aihuaxu, what documentation changes are needed now that this issue is resolved?

            leftyl Lefty Leverenz added a comment - Doc query: the6campbells said in the second comment on this issue: While Postgres may not support ISO-SQL syntax as shown other vendors do. That said, if the engine does not attempt to support distinct aggregation then a parsing error should be thrown and the documentation improved at https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics So aihuaxu , what documentation changes are needed now that this issue is resolved?
            aihuaxu Aihua Xu added a comment -

            leftylev I just updated the doc.

            I also filed HIVE-13453 to track the improvement to address the current limitation.

            aihuaxu Aihua Xu added a comment - leftylev I just updated the doc. I also filed HIVE-13453 to track the improvement to address the current limitation.
            leftyl Lefty Leverenz added a comment -

            Thanks aihuaxu!

            leftyl Lefty Leverenz added a comment - Thanks aihuaxu !

            People

              aihuaxu Aihua Xu
              the6campbells N Campbell
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: