Hive
  1. Hive
  2. HIVE-5356

Move arithmatic UDFs to generic UDF implementations

    Details

    • Type: Task Task
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.11.0
    • Fix Version/s: 0.13.0
    • Component/s: UDF
    • Labels:
      None
    • Hadoop Flags:
      Incompatible change

      Description

      Currently, all of the arithmetic operators, such as add/sub/mult/div, are implemented as old-style UDFs and java reflection is used to determine the return type TypeInfos/ObjectInspectors, based on the return type of the evaluate() method chosen for the expression. This works fine for types that don't have type params.

      Hive decimal type participates in these operations just like int or double. Different from double or int, however, decimal has precision and scale, which cannot be determined by just looking at the return type (decimal) of the UDF evaluate() method, even though the operands have certain precision/scale. With the default of "decimal" without precision/scale, then (10, 0) will be the type params. This is certainly not desirable.

      To solve this problem, all of the arithmetic operators would need to be implemented as GenericUDFs, which allow returning ObjectInspector during the initialize() method. The object inspectors returned can carry type params, from which the "exact" return type can be determined.

      It's worth mentioning that, for user UDF implemented in non-generic way, if the return type of the chosen evaluate() method is decimal, the return type actually has (10,0) as precision/scale, which might not be desirable. This needs to be documented.

      This JIRA will cover minus, plus, divide, multiply, mod, and pmod, to limit the scope of review. The remaining ones will be covered under HIVE-5706.

      1. HIVE-5356.12.patch
        1.22 MB
        Xuefu Zhang
      2. HIVE-5356.11.patch
        1.22 MB
        Xuefu Zhang
      3. HIVE-5356.10.patch
        1.22 MB
        Xuefu Zhang
      4. HIVE-5356.9.patch
        1.22 MB
        Xuefu Zhang
      5. HIVE-5356.8.patch
        1.22 MB
        Xuefu Zhang
      6. HIVE-5356.7.patch
        1.22 MB
        Xuefu Zhang
      7. HIVE-5356.6.patch
        1.21 MB
        Xuefu Zhang
      8. HIVE-5356.5.patch
        1.21 MB
        Xuefu Zhang
      9. HIVE-5356.4.patch
        1.20 MB
        Xuefu Zhang
      10. HIVE-5356.3.patch
        1.22 MB
        Xuefu Zhang
      11. HIVE-5356.2.patch
        161 kB
        Xuefu Zhang
      12. HIVE-5356.1.patch
        126 kB
        Xuefu Zhang

        Issue Links

          Activity

          Hide
          Xuefu Zhang added a comment -

          Submitted the patch to let tests run, new test cases are to be introduced.

          Show
          Xuefu Zhang added a comment - Submitted the patch to let tests run, new test cases are to be introduced.
          Hide
          Xuefu Zhang added a comment -
          Show
          Xuefu Zhang added a comment - RB: https://reviews.apache.org/r/15113/
          Hide
          Jason Dere added a comment -

          Nice. I was playing with the patch and had some questions about the return typeInfo when using decimal with non-decimals:

          hive> explain select cast(1 as decimal(10,2)) / cast(1 as decimal(10,2))  from src;
          ...
                        expressions:
                              expr: (CAST( 1 AS decimal(10,2)) / CAST( 1 AS decimal(10,2)))
                              type: decimal(23,13)
          

          Ok so far.

          hive> explain select cast(1 as decimal(10,2)) / cast(1 as decimal(10,2)) /2 from src;
          ...
                        expressions:
                              expr: ((CAST( 1 AS decimal(10,2)) / CAST( 1 AS decimal(10,2))) / 2)
                              type: double
          

          Maybe we can keep the result type as decimal when possible? At least when the other argument is also exact numeric (int, short, etc).

          Show
          Jason Dere added a comment - Nice. I was playing with the patch and had some questions about the return typeInfo when using decimal with non-decimals: hive> explain select cast(1 as decimal(10,2)) / cast(1 as decimal(10,2)) from src; ... expressions: expr: (CAST( 1 AS decimal(10,2)) / CAST( 1 AS decimal(10,2))) type: decimal(23,13) Ok so far. hive> explain select cast(1 as decimal(10,2)) / cast(1 as decimal(10,2)) /2 from src; ... expressions: expr: ((CAST( 1 AS decimal(10,2)) / CAST( 1 AS decimal(10,2))) / 2) type: double Maybe we can keep the result type as decimal when possible? At least when the other argument is also exact numeric (int, short, etc).
          Hide
          Xuefu Zhang added a comment -

          Thanks, Jason. It's a good observation. That's an area where I have been coping with. Currently the type ladder on type promotion (byte >short>int->long->float->double->decimal->string) doesn't seem suitable for precision-preserving, and I'm trying to redefine it. I'm also looking at how mysql is doing on that. I agree that decimal type should be preserved in this case.

          Show
          Xuefu Zhang added a comment - Thanks, Jason. It's a good observation. That's an area where I have been coping with. Currently the type ladder on type promotion (byte >short >int->long->float->double->decimal->string) doesn't seem suitable for precision-preserving, and I'm trying to redefine it. I'm also looking at how mysql is doing on that. I agree that decimal type should be preserved in this case.
          Hide
          Hive QA added a comment -

          Overall: -1 no tests executed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12611254/HIVE-5356.patch

          Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/91/testReport
          Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/91/console

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Tests failed with: NonZeroExitCodeException: Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n '' ]]
          + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128'
          + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128'
          + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128'
          + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128'
          + cd /data/hive-ptest/working/
          + tee /data/hive-ptest/logs/PreCommit-HIVE-Build-91/source-prep.txt
          + [[ true == \t\r\u\e ]]
          + rm -rf ivy maven
          + mkdir -p maven ivy
          + [[ svn = \s\v\n ]]
          + [[ -n '' ]]
          + [[ -d apache-svn-trunk-source ]]
          + [[ ! -d apache-svn-trunk-source/.svn ]]
          + [[ ! -d apache-svn-trunk-source ]]
          + cd apache-svn-trunk-source
          + svn revert -R .
          ++ awk '{print $2}'
          ++ egrep -v '^X|^Performing status on external'
          ++ svn status --no-ignore
          + rm -rf target datanucleus.log ant/target shims/target shims/0.20/target shims/0.20S/target shims/0.23/target shims/common/target shims/common-secure/target packaging/target hbase-handler/target testutils/target jdbc/target metastore/target itests/target itests/hcatalog-unit/target itests/test-serde/target itests/qtest/target itests/hive-unit/target itests/custom-serde/target itests/util/target hcatalog/target hcatalog/storage-handlers/hbase/target hcatalog/server-extensions/target hcatalog/core/target hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target hcatalog/hcatalog-pig-adapter/target hwi/target common/target common/src/gen contrib/target service/target serde/target beeline/target odbc/target cli/target ql/dependency-reduced-pom.xml ql/target
          + svn update
          
          Fetching external item into 'hcatalog/src/test/e2e/harness'
          External at revision 1537787.
          
          At revision 1537787.
          + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
          + patchFilePath=/data/hive-ptest/working/scratch/build.patch
          + [[ -f /data/hive-ptest/working/scratch/build.patch ]]
          + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
          + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch
          The patch does not appear to apply with p0, p1, or p2
          + exit 1
          '
          

          This message is automatically generated.

          Show
          Hive QA added a comment - Overall : -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12611254/HIVE-5356.patch Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/91/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/91/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Tests failed with: NonZeroExitCodeException: Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n '' ]] + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-Build-91/source-prep.txt + [[ true == \t\r\u\e ]] + rm -rf ivy maven + mkdir -p maven ivy + [[ svn = \s\v\n ]] + [[ -n '' ]] + [[ -d apache-svn-trunk-source ]] + [[ ! -d apache-svn-trunk-source/.svn ]] + [[ ! -d apache-svn-trunk-source ]] + cd apache-svn-trunk-source + svn revert -R . ++ awk '{print $2}' ++ egrep -v '^X|^Performing status on external' ++ svn status --no-ignore + rm -rf target datanucleus.log ant/target shims/target shims/0.20/target shims/0.20S/target shims/0.23/target shims/common/target shims/common-secure/target packaging/target hbase-handler/target testutils/target jdbc/target metastore/target itests/target itests/hcatalog-unit/target itests/test-serde/target itests/qtest/target itests/hive-unit/target itests/custom-serde/target itests/util/target hcatalog/target hcatalog/storage-handlers/hbase/target hcatalog/server-extensions/target hcatalog/core/target hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target hcatalog/hcatalog-pig-adapter/target hwi/target common/target common/src/gen contrib/target service/target serde/target beeline/target odbc/target cli/target ql/dependency-reduced-pom.xml ql/target + svn update Fetching external item into 'hcatalog/src/test/e2e/harness' External at revision 1537787. At revision 1537787. + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0, p1, or p2 + exit 1 ' This message is automatically generated.
          Hide
          Xuefu Zhang added a comment -

          Patch #2 is rebased and updated. Still missing test cases.

          Show
          Xuefu Zhang added a comment - Patch #2 is rebased and updated. Still missing test cases.
          Hide
          Xuefu Zhang added a comment -

          Patch #3 included additional tests, fixes.

          Show
          Xuefu Zhang added a comment - Patch #3 included additional tests, fixes.
          Hide
          Xuefu Zhang added a comment -

          Please note that patch contains that for HIVE-5726, which is required to run the test suite.

          Show
          Xuefu Zhang added a comment - Please note that patch contains that for HIVE-5726 , which is required to run the test suite.
          Hide
          Hive QA added a comment -

          Overall: -1 at least one tests failed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12611833/HIVE-5356.2.patch

          ERROR: -1 due to 53 failed/errored test(s), 4589 tests executed
          Failed tests:

          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join13
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join2
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin1
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin2
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin3
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin4
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin5
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin_negative3
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_decimal_6
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_decimal_udf
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_1
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_skew_1
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_infer_bucket_sort
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input8
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_literals
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_literal_decimal
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_num_op_type_conv
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_createas1
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_constant_expr
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ql_rewrite_gbtoidx
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_rcfile_createas1
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_rcfile_merge1
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_rcfile_merge2
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoin
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats11
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_pmod
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_round
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_round_2
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_12
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_13
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_15
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_4
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_5
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_short_regress
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_math_funcs
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_windowing_expressions
          org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_invalid_arithmetic_type
          org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_udf_assert_true2
          org.apache.hadoop.hive.ql.exec.TestFunctionRegistry.testCommonClassComparison
          org.apache.hadoop.hive.ql.exec.TestFunctionRegistry.testCommonClassUnionAll
          org.apache.hadoop.hive.ql.exec.vector.TestVectorizationContext.testArithmeticExpressionVectorization
          org.apache.hadoop.hive.ql.parse.TestParse.testParse_cast1
          org.apache.hadoop.hive.ql.parse.TestParse.testParse_input20
          org.apache.hadoop.hive.ql.parse.TestParse.testParse_input8
          org.apache.hadoop.hive.ql.parse.TestParse.testParse_join2
          org.apache.hadoop.hive.ql.parse.TestParse.testParse_sample1
          org.apache.hadoop.hive.ql.parse.TestParse.testParse_sample2
          org.apache.hadoop.hive.ql.parse.TestParse.testParse_sample3
          org.apache.hadoop.hive.ql.parse.TestParse.testParse_sample4
          org.apache.hadoop.hive.ql.parse.TestParse.testParse_sample5
          org.apache.hadoop.hive.ql.parse.TestParse.testParse_sample6
          org.apache.hadoop.hive.ql.parse.TestParse.testParse_sample7
          org.apache.hadoop.hive.ql.parse.TestParse.testParse_udf4
          

          Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/124/testReport
          Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/124/console

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          Tests failed with: TestsFailedException: 53 tests failed
          

          This message is automatically generated.

          ATTACHMENT ID: 12611833

          Show
          Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12611833/HIVE-5356.2.patch ERROR: -1 due to 53 failed/errored test(s), 4589 tests executed Failed tests: org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join13 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin_negative3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_decimal_6 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_decimal_udf org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_skew_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_infer_bucket_sort org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_literals org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_literal_decimal org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_num_op_type_conv org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_createas1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_constant_expr org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ql_rewrite_gbtoidx org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_rcfile_createas1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_rcfile_merge1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_rcfile_merge2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoin org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats11 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_pmod org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_round org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_round_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_13 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_15 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_short_regress org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_math_funcs org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_windowing_expressions org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_invalid_arithmetic_type org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_udf_assert_true2 org.apache.hadoop.hive.ql.exec.TestFunctionRegistry.testCommonClassComparison org.apache.hadoop.hive.ql.exec.TestFunctionRegistry.testCommonClassUnionAll org.apache.hadoop.hive.ql.exec.vector.TestVectorizationContext.testArithmeticExpressionVectorization org.apache.hadoop.hive.ql.parse.TestParse.testParse_cast1 org.apache.hadoop.hive.ql.parse.TestParse.testParse_input20 org.apache.hadoop.hive.ql.parse.TestParse.testParse_input8 org.apache.hadoop.hive.ql.parse.TestParse.testParse_join2 org.apache.hadoop.hive.ql.parse.TestParse.testParse_sample1 org.apache.hadoop.hive.ql.parse.TestParse.testParse_sample2 org.apache.hadoop.hive.ql.parse.TestParse.testParse_sample3 org.apache.hadoop.hive.ql.parse.TestParse.testParse_sample4 org.apache.hadoop.hive.ql.parse.TestParse.testParse_sample5 org.apache.hadoop.hive.ql.parse.TestParse.testParse_sample6 org.apache.hadoop.hive.ql.parse.TestParse.testParse_sample7 org.apache.hadoop.hive.ql.parse.TestParse.testParse_udf4 Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/124/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/124/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests failed with: TestsFailedException: 53 tests failed This message is automatically generated. ATTACHMENT ID: 12611833
          Hide
          Jason Dere added a comment -

          Just want to point out to folks that this does result in some changes in return type behavior for arithmetic operators, some of which were a bit of a surprise:

          int / int => decimal
          float / float => double
          float * float => double
          float + float => double

          That said, this is still in line with what the SQL standard says about the resulting type of these arithmetic operators (shown below), so I think this is ok:

          1) If the declared type of both operands of a dyadic arithmetic operator is exact numeric, then the declared type of the result is an implementation-defined exact numeric type, with precision and scale determined as follows:
          a) Let S1 and S2 be the scale of the first and second operands respectively.
          b) The precision of the result of addition and subtraction is implementation-defined, and the scale is the
          maximum of S1 and S2.
          c) The precision of the result of multiplication is implementation-defined, and the scale is S1 + S2.
          d) The precision and scale of the result of division are implementation-defined.
          2) If the declared type of either operand of a dyadic arithmetic operator is approximate numeric, then the declared type of the result is an implementation-defined approximate numeric type.

          Show
          Jason Dere added a comment - Just want to point out to folks that this does result in some changes in return type behavior for arithmetic operators, some of which were a bit of a surprise: int / int => decimal float / float => double float * float => double float + float => double That said, this is still in line with what the SQL standard says about the resulting type of these arithmetic operators (shown below), so I think this is ok: 1) If the declared type of both operands of a dyadic arithmetic operator is exact numeric, then the declared type of the result is an implementation-defined exact numeric type, with precision and scale determined as follows: a) Let S1 and S2 be the scale of the first and second operands respectively. b) The precision of the result of addition and subtraction is implementation-defined, and the scale is the maximum of S1 and S2. c) The precision of the result of multiplication is implementation-defined, and the scale is S1 + S2. d) The precision and scale of the result of division are implementation-defined. 2) If the declared type of either operand of a dyadic arithmetic operator is approximate numeric, then the declared type of the result is an implementation-defined approximate numeric type.
          Hide
          Xuefu Zhang added a comment -

          Jason Dere Thanks for sharing your findings. While it may seem a little surprising, I think we are doing the right thing by being more in line with SQL standard and other DB implementations. For instance, int is exact type, so the result of int/int should be an exact time rather than the current double. Operations between floats can stay with float, but I guess the change makes hive inline with mysql, and our rule also becomes simpler: as long as there is a non-exact type in the operation, the result type is double.

          Show
          Xuefu Zhang added a comment - Jason Dere Thanks for sharing your findings. While it may seem a little surprising, I think we are doing the right thing by being more in line with SQL standard and other DB implementations. For instance, int is exact type, so the result of int/int should be an exact time rather than the current double. Operations between floats can stay with float, but I guess the change makes hive inline with mysql, and our rule also becomes simpler: as long as there is a non-exact type in the operation, the result type is double.
          Hide
          Lefty Leverenz added a comment -

          Will this be documented in the wiki?

          It could go in two places (or just one with a link from the other):

          Show
          Lefty Leverenz added a comment - Will this be documented in the wiki? It could go in two places (or just one with a link from the other): Hive Data Types Arithmetic Operators
          Hide
          Xuefu Zhang added a comment -

          [~lefty@hortonworks.com] Yes, this should be documented. However, the scope of the documentation is beyond this JIRA as we have other JIRA tasks (existing or to be created) that may also have impact on data types and numeric operations. Thus, I created HIVE-5748 to wrap this up including documentation. Thanks.

          Show
          Xuefu Zhang added a comment - [~lefty@hortonworks.com] Yes, this should be documented. However, the scope of the documentation is beyond this JIRA as we have other JIRA tasks (existing or to be created) that may also have impact on data types and numeric operations. Thus, I created HIVE-5748 to wrap this up including documentation. Thanks.
          Hide
          Xuefu Zhang added a comment -

          Patch # is updated with fixes for a bunch of test failures. Submitted patch to trigger another test run.

          Show
          Xuefu Zhang added a comment - Patch # is updated with fixes for a bunch of test failures. Submitted patch to trigger another test run.
          Hide
          Hive QA added a comment -

          Overall: -1 no tests executed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12612435/HIVE-5356.3.patch

          Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/159/testReport
          Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/159/console

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Tests failed with: NonZeroExitCodeException: Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n '' ]]
          + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128'
          + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128'
          + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128'
          + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128'
          + cd /data/hive-ptest/working/
          + tee /data/hive-ptest/logs/PreCommit-HIVE-Build-159/source-prep.txt
          + [[ true == \t\r\u\e ]]
          + rm -rf ivy maven
          + mkdir -p maven ivy
          + [[ svn = \s\v\n ]]
          + [[ -n '' ]]
          + [[ -d apache-svn-trunk-source ]]
          + [[ ! -d apache-svn-trunk-source/.svn ]]
          + [[ ! -d apache-svn-trunk-source ]]
          + cd apache-svn-trunk-source
          + svn revert -R .
          ++ awk '{print $2}'
          ++ egrep -v '^X|^Performing status on external'
          ++ svn status --no-ignore
          + rm -rf target datanucleus.log ant/target shims/target shims/0.20/target shims/0.20S/target shims/0.23/target shims/common/target shims/common-secure/target packaging/target hbase-handler/target testutils/target jdbc/target metastore/target itests/target itests/hcatalog-unit/target itests/test-serde/target itests/qtest/target itests/hive-unit/target itests/custom-serde/target itests/util/target hcatalog/target hcatalog/storage-handlers/hbase/target hcatalog/server-extensions/target hcatalog/core/target hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target hcatalog/hcatalog-pig-adapter/target hwi/target common/target common/src/gen service/target contrib/target serde/target beeline/target odbc/target cli/target ql/dependency-reduced-pom.xml ql/target
          + svn update
          
          Fetching external item into 'hcatalog/src/test/e2e/harness'
          External at revision 1539538.
          
          At revision 1539538.
          + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
          + patchFilePath=/data/hive-ptest/working/scratch/build.patch
          + [[ -f /data/hive-ptest/working/scratch/build.patch ]]
          + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
          + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch
          The patch does not appear to apply with p0, p1, or p2
          + exit 1
          '
          

          This message is automatically generated.

          ATTACHMENT ID: 12612435

          Show
          Hive QA added a comment - Overall : -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12612435/HIVE-5356.3.patch Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/159/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/159/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Tests failed with: NonZeroExitCodeException: Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n '' ]] + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-Build-159/source-prep.txt + [[ true == \t\r\u\e ]] + rm -rf ivy maven + mkdir -p maven ivy + [[ svn = \s\v\n ]] + [[ -n '' ]] + [[ -d apache-svn-trunk-source ]] + [[ ! -d apache-svn-trunk-source/.svn ]] + [[ ! -d apache-svn-trunk-source ]] + cd apache-svn-trunk-source + svn revert -R . ++ awk '{print $2}' ++ egrep -v '^X|^Performing status on external' ++ svn status --no-ignore + rm -rf target datanucleus.log ant/target shims/target shims/0.20/target shims/0.20S/target shims/0.23/target shims/common/target shims/common-secure/target packaging/target hbase-handler/target testutils/target jdbc/target metastore/target itests/target itests/hcatalog-unit/target itests/test-serde/target itests/qtest/target itests/hive-unit/target itests/custom-serde/target itests/util/target hcatalog/target hcatalog/storage-handlers/hbase/target hcatalog/server-extensions/target hcatalog/core/target hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target hcatalog/hcatalog-pig-adapter/target hwi/target common/target common/src/gen service/target contrib/target serde/target beeline/target odbc/target cli/target ql/dependency-reduced-pom.xml ql/target + svn update Fetching external item into 'hcatalog/src/test/e2e/harness' External at revision 1539538. At revision 1539538. + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0, p1, or p2 + exit 1 ' This message is automatically generated. ATTACHMENT ID: 12612435
          Hide
          Xuefu Zhang added a comment -

          Patch #5 rebased with latest trunk (again).

          Show
          Xuefu Zhang added a comment - Patch #5 rebased with latest trunk (again).
          Hide
          Hive QA added a comment -

          Overall: -1 at least one tests failed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12612668/HIVE-5356.4.patch

          ERROR: -1 due to 3 failed/errored test(s), 4636 tests executed
          Failed tests:

          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_decimal_udf
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_pmod
          org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_invalid_arithmetic_type
          

          Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/210/testReport
          Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/210/console

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          Tests failed with: TestsFailedException: 3 tests failed
          

          This message is automatically generated.

          ATTACHMENT ID: 12612668

          Show
          Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12612668/HIVE-5356.4.patch ERROR: -1 due to 3 failed/errored test(s), 4636 tests executed Failed tests: org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_decimal_udf org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_pmod org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_invalid_arithmetic_type Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/210/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/210/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests failed with: TestsFailedException: 3 tests failed This message is automatically generated. ATTACHMENT ID: 12612668
          Hide
          Xuefu Zhang added a comment -

          Patch #5 fixes remaining test failures. Still not final yet. Some refactoring pending.

          Show
          Xuefu Zhang added a comment - Patch #5 fixes remaining test failures. Still not final yet. Some refactoring pending.
          Hide
          Xuefu Zhang added a comment -

          Patch #6 is equivalent to #5 with some minor refactoring.

          Show
          Xuefu Zhang added a comment - Patch #6 is equivalent to #5 with some minor refactoring.
          Hide
          Hive QA added a comment -

          Overall: -1 at least one tests failed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12613404/HIVE-5356.6.patch

          ERROR: -1 due to 10 failed/errored test(s), 4644 tests executed
          Failed tests:

          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input8
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_num_op_type_conv
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_constant_expr
          org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_udf_assert_true2
          org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_udf_coalesce
          org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_udf_in
          org.apache.hadoop.hive.ql.exec.TestFunctionRegistry.testCommonClassComparison
          org.apache.hadoop.hive.ql.exec.TestFunctionRegistry.testGetMethodInternal
          org.apache.hadoop.hive.ql.parse.TestParse.testParse_input8
          org.apache.hadoop.hive.ql.udf.generic.TestGenericUDFPosMod.testDecimalPosModDecimal
          

          Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/249/testReport
          Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/249/console

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          Tests failed with: TestsFailedException: 10 tests failed
          

          This message is automatically generated.

          ATTACHMENT ID: 12613404

          Show
          Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12613404/HIVE-5356.6.patch ERROR: -1 due to 10 failed/errored test(s), 4644 tests executed Failed tests: org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_num_op_type_conv org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_constant_expr org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_udf_assert_true2 org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_udf_coalesce org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_udf_in org.apache.hadoop.hive.ql.exec.TestFunctionRegistry.testCommonClassComparison org.apache.hadoop.hive.ql.exec.TestFunctionRegistry.testGetMethodInternal org.apache.hadoop.hive.ql.parse.TestParse.testParse_input8 org.apache.hadoop.hive.ql.udf.generic.TestGenericUDFPosMod.testDecimalPosModDecimal Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/249/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/249/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests failed with: TestsFailedException: 10 tests failed This message is automatically generated. ATTACHMENT ID: 12613404
          Hide
          Xuefu Zhang added a comment -

          Patch #7 attempted to fix the test failures.

          Show
          Xuefu Zhang added a comment - Patch #7 attempted to fix the test failures.
          Hide
          Hive QA added a comment -

          Overall: -1 at least one tests failed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12613514/HIVE-5356.7.patch

          ERROR: -1 due to 1 failed/errored test(s), 4644 tests executed
          Failed tests:

          org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucket_num_reducers
          

          Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/252/testReport
          Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/252/console

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          Tests failed with: TestsFailedException: 1 tests failed
          

          This message is automatically generated.

          ATTACHMENT ID: 12613514

          Show
          Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12613514/HIVE-5356.7.patch ERROR: -1 due to 1 failed/errored test(s), 4644 tests executed Failed tests: org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucket_num_reducers Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/252/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/252/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests failed with: TestsFailedException: 1 tests failed This message is automatically generated. ATTACHMENT ID: 12613514
          Hide
          Xuefu Zhang added a comment -

          The above test failure doesn't seem related to my changes. Manually ran that particular test case and it passed.

          Show
          Xuefu Zhang added a comment - The above test failure doesn't seem related to my changes. Manually ran that particular test case and it passed.
          Hide
          Xuefu Zhang added a comment -

          Patch #8, Minor format cleanup.

          Show
          Xuefu Zhang added a comment - Patch #8, Minor format cleanup.
          Hide
          Hive QA added a comment -

          Overall: -1 at least one tests failed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12613625/HIVE-5356.8.patch

          ERROR: -1 due to 1 failed/errored test(s), 4648 tests executed
          Failed tests:

          org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucket_num_reducers
          

          Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/262/testReport
          Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/262/console

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          Tests failed with: TestsFailedException: 1 tests failed
          

          This message is automatically generated.

          ATTACHMENT ID: 12613625

          Show
          Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12613625/HIVE-5356.8.patch ERROR: -1 due to 1 failed/errored test(s), 4648 tests executed Failed tests: org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucket_num_reducers Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/262/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/262/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests failed with: TestsFailedException: 1 tests failed This message is automatically generated. ATTACHMENT ID: 12613625
          Hide
          Xuefu Zhang added a comment -

          Patch #9 removed a few trailing spaces.

          Show
          Xuefu Zhang added a comment - Patch #9 removed a few trailing spaces.
          Hide
          Hive QA added a comment -

          Overall: +1 all checks pass

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12613886/HIVE-5356.9.patch

          SUCCESS: +1 4653 tests passed

          Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/280/testReport
          Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/280/console

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          

          This message is automatically generated.

          ATTACHMENT ID: 12613886

          Show
          Hive QA added a comment - Overall : +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12613886/HIVE-5356.9.patch SUCCESS: +1 4653 tests passed Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/280/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/280/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase This message is automatically generated. ATTACHMENT ID: 12613886
          Hide
          Xuefu Zhang added a comment -

          Patch #10 fixed a bug and added more test cases.

          Show
          Xuefu Zhang added a comment - Patch #10 fixed a bug and added more test cases.
          Hide
          Hive QA added a comment -

          Overall: -1 no tests executed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12613948/HIVE-5356.10.patch

          Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/290/testReport
          Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/290/console

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Tests failed with: IllegalArgumentException: null
          

          This message is automatically generated.

          ATTACHMENT ID: 12613948

          Show
          Hive QA added a comment - Overall : -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12613948/HIVE-5356.10.patch Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/290/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/290/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Tests failed with: IllegalArgumentException: null This message is automatically generated. ATTACHMENT ID: 12613948
          Hide
          Xuefu Zhang added a comment -

          Patch #11 rebased with latest trunk.

          Show
          Xuefu Zhang added a comment - Patch #11 rebased with latest trunk.
          Hide
          Hive QA added a comment -

          Overall: -1 at least one tests failed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12614435/HIVE-5356.11.patch

          ERROR: -1 due to 3 failed/errored test(s), 4665 tests executed
          Failed tests:

          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynamic_partition_skip_default
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_case
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_when
          

          Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/345/testReport
          Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/345/console

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          Tests failed with: TestsFailedException: 3 tests failed
          

          This message is automatically generated.

          ATTACHMENT ID: 12614435

          Show
          Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12614435/HIVE-5356.11.patch ERROR: -1 due to 3 failed/errored test(s), 4665 tests executed Failed tests: org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynamic_partition_skip_default org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_case org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_when Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/345/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/345/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests failed with: TestsFailedException: 3 tests failed This message is automatically generated. ATTACHMENT ID: 12614435
          Hide
          Xuefu Zhang added a comment -

          Patch #12 addressed the new test failures and provided a temp fix for it. The correct fix will be provided in HIVE-5848.

          Show
          Xuefu Zhang added a comment - Patch #12 addressed the new test failures and provided a temp fix for it. The correct fix will be provided in HIVE-5848 .
          Hide
          Hive QA added a comment -

          Overall: -1 at least one tests failed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12614514/HIVE-5356.12.patch

          ERROR: -1 due to 1 failed/errored test(s), 4665 tests executed
          Failed tests:

          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynamic_partition_skip_default
          

          Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/348/testReport
          Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/348/console

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          Tests failed with: TestsFailedException: 1 tests failed
          

          This message is automatically generated.

          ATTACHMENT ID: 12614514

          Show
          Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12614514/HIVE-5356.12.patch ERROR: -1 due to 1 failed/errored test(s), 4665 tests executed Failed tests: org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynamic_partition_skip_default Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/348/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/348/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests failed with: TestsFailedException: 1 tests failed This message is automatically generated. ATTACHMENT ID: 12614514
          Hide
          Xuefu Zhang added a comment -

          The above test failure was due to HIVE-5844.

          Show
          Xuefu Zhang added a comment - The above test failure was due to HIVE-5844 .
          Hide
          Brock Noland added a comment -

          +1

          Show
          Brock Noland added a comment - +1
          Hide
          Xuefu Zhang added a comment -

          Patch committed to trunk. Thank Brock for the review.

          Show
          Xuefu Zhang added a comment - Patch committed to trunk. Thank Brock for the review.
          Hide
          Thejas M Nair added a comment -

          I have marked this as an incompatible change in the jira. I think we need to think carefully about this backward incompatible behavior (int/int returning decimal instead of double.)

          Show
          Thejas M Nair added a comment - I have marked this as an incompatible change in the jira. I think we need to think carefully about this backward incompatible behavior (int/int returning decimal instead of double.)
          Hide
          Xuefu Zhang added a comment -

          Yeah, it's incompatible. However, it's the correct behaviour. Regardless, it's a matter whether we want to correct or continue to follow the incorrect behavior. For decimal precision/scale support, we decided the former. And this is just a natural part or extension of that effort.

          Show
          Xuefu Zhang added a comment - Yeah, it's incompatible. However, it's the correct behaviour. Regardless, it's a matter whether we want to correct or continue to follow the incorrect behavior. For decimal precision/scale support, we decided the former. And this is just a natural part or extension of that effort.
          Hide
          Thejas M Nair added a comment -

          I would call it a SQL compliance issue and not a correctness issue.

          Show
          Thejas M Nair added a comment - I would call it a SQL compliance issue and not a correctness issue.
          Hide
          Jason Dere added a comment -

          One big effect in changing int / int => decimal is the performance impact, since decimal arithmetic is quite a bit slower. Did a test similar to the division unit tests, running GenericUDFOPDivide.evaluate() in a loop with both double args and Decimal args. On my laptop, running the loop with decimal division was over 50x slower than using double division.

          Time in ms, 10M iterations:
          double: 260
          decimal: 13993

          The loop I ran for double is below, I had a similar function for decimal:

            public static long testDivideDouble(double a, double b, int iterations) throws HiveException {
              GenericUDFOPDivide udf = new GenericUDFOPDivide();
          
              DoubleWritable left = new DoubleWritable(a);
              DoubleWritable right = new DoubleWritable(b);
              ObjectInspector[] inputOIs = {
                  PrimitiveObjectInspectorFactory.writableDoubleObjectInspector,
                  PrimitiveObjectInspectorFactory.writableDoubleObjectInspector
              };
              DeferredObject[] args = {
                  new DeferredJavaObject(left),
                  new DeferredJavaObject(right),
              };
          
              PrimitiveObjectInspector oi = (PrimitiveObjectInspector) udf.initialize(inputOIs);
          
              long start = System.currentTimeMillis();
              for (int idx = 0; idx < iterations; ++idx) {
                doubleResult = (DoubleWritable) udf.evaluate(args);
              }
              long end = System.currentTimeMillis();
              return end - start;
            }
          
          Show
          Jason Dere added a comment - One big effect in changing int / int => decimal is the performance impact, since decimal arithmetic is quite a bit slower. Did a test similar to the division unit tests, running GenericUDFOPDivide.evaluate() in a loop with both double args and Decimal args. On my laptop, running the loop with decimal division was over 50x slower than using double division. Time in ms, 10M iterations: double: 260 decimal: 13993 The loop I ran for double is below, I had a similar function for decimal: public static long testDivideDouble( double a, double b, int iterations) throws HiveException { GenericUDFOPDivide udf = new GenericUDFOPDivide(); DoubleWritable left = new DoubleWritable(a); DoubleWritable right = new DoubleWritable(b); ObjectInspector[] inputOIs = { PrimitiveObjectInspectorFactory.writableDoubleObjectInspector, PrimitiveObjectInspectorFactory.writableDoubleObjectInspector }; DeferredObject[] args = { new DeferredJavaObject(left), new DeferredJavaObject(right), }; PrimitiveObjectInspector oi = (PrimitiveObjectInspector) udf.initialize(inputOIs); long start = System .currentTimeMillis(); for ( int idx = 0; idx < iterations; ++idx) { doubleResult = (DoubleWritable) udf.evaluate(args); } long end = System .currentTimeMillis(); return end - start; }
          Hide
          Jason Dere added a comment -

          So for users who may be concerned with speeding up queries involving integer division (at the cost of using approximate precision types), they may want to consider casting values to double prior to division.

          Show
          Jason Dere added a comment - So for users who may be concerned with speeding up queries involving integer division (at the cost of using approximate precision types), they may want to consider casting values to double prior to division.
          Hide
          Eric Hanson added a comment -

          Linking to a related issue (proposed change to make avg(int) return decimal) which is also an incompatible change.

          Show
          Eric Hanson added a comment - Linking to a related issue (proposed change to make avg(int) return decimal) which is also an incompatible change.
          Hide
          Eric Hanson added a comment -

          Before this patch was committed, integer-integer division vectorized. Now it does not. This is a performance regression and also a functional regression for "EXPLAIN". This may have been caught by the vectorization tests (see test output in comment above on about 3 Nov), but maybe it was not clear to the developers of this patch because vectorization is pretty new. If a vectorization test .q.out file contains in EXPLAIN output the string "Vectorized execution: true" then the plan vectorizes. It is important that future patches not regress this behavior for performance reasons. I would like to see any regressions to vectorization be fixed before patches are applied, ideally, or else have some discussion and consensus.

          Show
          Eric Hanson added a comment - Before this patch was committed, integer-integer division vectorized. Now it does not. This is a performance regression and also a functional regression for "EXPLAIN". This may have been caught by the vectorization tests (see test output in comment above on about 3 Nov), but maybe it was not clear to the developers of this patch because vectorization is pretty new. If a vectorization test .q.out file contains in EXPLAIN output the string "Vectorized execution: true" then the plan vectorizes. It is important that future patches not regress this behavior for performance reasons. I would like to see any regressions to vectorization be fixed before patches are applied, ideally, or else have some discussion and consensus.
          Hide
          Eric Hanson added a comment -

          The patch contains some lines like this:

          code

          • Vectorized execution: true"
            code
          Show
          Eric Hanson added a comment - The patch contains some lines like this: code Vectorized execution: true" code
          Hide
          Eric Hanson added a comment -

          Correction. The patch contains some lines like this, in EXPLAIN output:

          -              Vectorized execution: true
          

          This indicates a plan change from vectorized execution to row-at-a-time execution.

          Show
          Eric Hanson added a comment - Correction. The patch contains some lines like this, in EXPLAIN output: - Vectorized execution: true This indicates a plan change from vectorized execution to row-at-a-time execution.
          Hide
          Xuefu Zhang added a comment -

          Before this patch was committed, integer-integer division vectorized. Now it does not. This is a performance regression and also a functional regression for "EXPLAIN". This may have been caught by the vectorization tests (see test output in comment above on about 3 Nov), but maybe it was not clear to the developers of this patch because vectorization is pretty new. If a vectorization test .q.out file contains in EXPLAIN output the string "Vectorized execution: true" then the plan vectorizes. It is important that future patches not regress this behavior for performance reasons. I would like to see any regressions to vectorization be fixed before patches are applied, ideally, or else have some discussion and consensus.

          While this patch may prevents vectorization for int/int, I don't think we should emphasize the idea of implementation over functionality, as this occurred over and over again. I also disagree about the label of "functional regression" for obvious reasons. Rather, I think functionality prevails over implementation. A feature with wrong functionality is as bad as, if not worse than, a bad performance. Having said this, I still support vectorization, but I would use this to kill anything that might impact vectorization.

          Show
          Xuefu Zhang added a comment - Before this patch was committed, integer-integer division vectorized. Now it does not. This is a performance regression and also a functional regression for "EXPLAIN". This may have been caught by the vectorization tests (see test output in comment above on about 3 Nov), but maybe it was not clear to the developers of this patch because vectorization is pretty new. If a vectorization test .q.out file contains in EXPLAIN output the string "Vectorized execution: true" then the plan vectorizes. It is important that future patches not regress this behavior for performance reasons. I would like to see any regressions to vectorization be fixed before patches are applied, ideally, or else have some discussion and consensus. While this patch may prevents vectorization for int/int, I don't think we should emphasize the idea of implementation over functionality, as this occurred over and over again. I also disagree about the label of "functional regression" for obvious reasons. Rather, I think functionality prevails over implementation. A feature with wrong functionality is as bad as, if not worse than, a bad performance. Having said this, I still support vectorization, but I would use this to kill anything that might impact vectorization.
          Hide
          Xuefu Zhang added a comment -

          I meant "I would NOT ..."

          Show
          Xuefu Zhang added a comment - I meant "I would NOT ..."
          Hide
          Eric Hanson added a comment -

          "Vectorized execution: true" in the explain plain was essentially meant to be an assertion that vectorization is working for the query in question. So going forward, again, it's important not to regress to non-vectorized execution. If it needs to be regressed for a reason, then I'd like to see discussion and general agreement or consensus on it. I wish I'd been tuned in to this one – it slipped by me at the time.

          Show
          Eric Hanson added a comment - "Vectorized execution: true" in the explain plain was essentially meant to be an assertion that vectorization is working for the query in question. So going forward, again, it's important not to regress to non-vectorized execution. If it needs to be regressed for a reason, then I'd like to see discussion and general agreement or consensus on it. I wish I'd been tuned in to this one – it slipped by me at the time.
          Hide
          Eric Hanson added a comment -

          I'd prefer that we modify this change to preserve the backward-compatible behavior that int / int yields double. Here’s why:

          It won’t break existing applications.

          The existing behavior is quite reasonable and I’ve never heard anybody complain about it. When you divide integers, you often want the information after the decimal. In Hive, you get it now without having to do a type cast. It’s kind of convenient. I think it’s a minor issue that it is not SQL-standard compliant.

          Double precision divide is almost two orders of magnitude faster than decimal divide

          It will allow vectorized integer-integer divide to keep working (fixing a regression caused by the patch)

          Hive is production software with a lot of users. Users do “create table as select …” in their workflows quite often. Their applications are depending on the output data types produced. Changing the result of “create table foo as select intCol1 / intCol2 as newCol, …” so that the data type of newCol is different (decimal instead of double) will be seen by some people as a breaking change in their application. Even if it is not a breaking change functionally, it can cause performance regressions for future queries on the data, since they will be then processing decimal instead of double.

          Decimal is a heavy-weight data type that I don’t think should ever be produced by an operator unless the user explicitly asked for it, or one of the input types was decimal. It’s inherently slower to do decimal arithmetic than integer/long/float/double arithmetic. Hive is used in performance-oriented, data warehouse database applications. I don’t think, in general, its code should be changed in a way that invites or causes performance regressions in people’s applications.

          Hive has a small development community. This type of change generates code churn for the community with no strong benefit to the users that I can see, and significant downside to the users.

          I appreciate the effort by contributors to make the decimal(p, s) data type work in Hive. People want to be able to represent currency and very long integer values, and this will help do that nicely. But I would like to see that they ask for it before they get expression results that use it.

          If there is a real strong reason and desire to make the result SQL standard compliant, I think int as a result of int/int is a better choice. Then it'd probably be necessary to deprecate the old way and have a switch to control the behavior for a while.

          Show
          Eric Hanson added a comment - I'd prefer that we modify this change to preserve the backward-compatible behavior that int / int yields double. Here’s why: It won’t break existing applications. The existing behavior is quite reasonable and I’ve never heard anybody complain about it. When you divide integers, you often want the information after the decimal. In Hive, you get it now without having to do a type cast. It’s kind of convenient. I think it’s a minor issue that it is not SQL-standard compliant. Double precision divide is almost two orders of magnitude faster than decimal divide It will allow vectorized integer-integer divide to keep working (fixing a regression caused by the patch) Hive is production software with a lot of users. Users do “create table as select …” in their workflows quite often. Their applications are depending on the output data types produced. Changing the result of “create table foo as select intCol1 / intCol2 as newCol, …” so that the data type of newCol is different (decimal instead of double) will be seen by some people as a breaking change in their application. Even if it is not a breaking change functionally, it can cause performance regressions for future queries on the data, since they will be then processing decimal instead of double. Decimal is a heavy-weight data type that I don’t think should ever be produced by an operator unless the user explicitly asked for it, or one of the input types was decimal. It’s inherently slower to do decimal arithmetic than integer/long/float/double arithmetic. Hive is used in performance-oriented, data warehouse database applications. I don’t think, in general, its code should be changed in a way that invites or causes performance regressions in people’s applications. Hive has a small development community. This type of change generates code churn for the community with no strong benefit to the users that I can see, and significant downside to the users. I appreciate the effort by contributors to make the decimal(p, s) data type work in Hive. People want to be able to represent currency and very long integer values, and this will help do that nicely. But I would like to see that they ask for it before they get expression results that use it. If there is a real strong reason and desire to make the result SQL standard compliant, I think int as a result of int/int is a better choice. Then it'd probably be necessary to deprecate the old way and have a switch to control the behavior for a while.
          Hide
          Thejas M Nair added a comment - - edited

          Here are my concerns with this change (Thanks to Jason for highlighting the differences in behavior)-

          1. The changes to floating point arithmetic are not backward compatible, and there is no SQL compliance benefit for that.
          2. Regarding integer division returning decimal .
            1. It will not be backward compatible with some udf implementations ( I believe this is same with change in floating point return type).
            2. Integer arithmetic becoming NULL in some cases
            3. more than 50x performance degradation for the arithmetic operation

          Regarding drive for making hive more SQL standard compliant, I believe motivation behind it is to make it easier to integrate with external
          tools and make it easier for people who are familiar with SQL to use hive. I am not sure if change helps with either of those two motivations. Most
          of the commercial databases return int result for integer division, and not decimal (Oracle, SQL Server, DB2, postgres).

          Show
          Thejas M Nair added a comment - - edited Here are my concerns with this change (Thanks to Jason for highlighting the differences in behavior)- The changes to floating point arithmetic are not backward compatible, and there is no SQL compliance benefit for that. Regarding integer division returning decimal . It will not be backward compatible with some udf implementations ( I believe this is same with change in floating point return type). Integer arithmetic becoming NULL in some cases more than 50x performance degradation for the arithmetic operation Regarding drive for making hive more SQL standard compliant, I believe motivation behind it is to make it easier to integrate with external tools and make it easier for people who are familiar with SQL to use hive. I am not sure if change helps with either of those two motivations. Most of the commercial databases return int result for integer division, and not decimal (Oracle, SQL Server, DB2, postgres).
          Hide
          Xuefu Zhang added a comment -

          1. The changes to floating point arithmetic are not backward compatible, and there is no SQL compliance benefit for that.

          The main reason is to be in line with MySQL and simplify the implementation. It could be kept in backward compatible manner.

          2.2 It will not be backward compatible with some udf implementations ( I believe this is same with change in floating point return type).

          SQL standard says that exact type division should result exact type. Double is non-compliant. Changing to int type has the same issue you're referring to.

          2.2 Integer arithmetic becoming NULL in some cases

          First, I don't think there is any standard saying that integer operation should not emit NULL.

          NULL is generated when an error occurs (such as overflow, divide by zero, etc. Currently emitting NULL is one of the few error handling options a modern databases have, but is the only one that hive has, though Hive isn't consistent. I'd argue generating NULL value is worse than generating bad or wrong values in error cases. To make things worse, user is not aware of that. (Take HIVE-5660 as an example.)

          We may introduce different server mode to config different error handling (HIVE-5438).

          2.3 more than 50x performance degradation for the arithmetic operation

          50x performance degradation came from a unit test, which doesn't necessary represents the Hive overall performance. Hive's performance will not be judged solely by int/int. The bigger question is: do we need some thing that's working and right, or something that's doing bad and fast. Performance can be improved down the road, but functionality deviations are hard to correct, as has been demonstrated in this discussion.

          Backward compatibility is a valid concern. However, the question is whether Hive is at a point where this has to be kept with any cost or we are willing to sacrifice some and achieve something that we think right.

          I have seen arguments from points of implementation over functionality, performance over correctness, which is, in my opinion, ill-constructed.

          Show
          Xuefu Zhang added a comment - 1. The changes to floating point arithmetic are not backward compatible, and there is no SQL compliance benefit for that. The main reason is to be in line with MySQL and simplify the implementation. It could be kept in backward compatible manner. 2.2 It will not be backward compatible with some udf implementations ( I believe this is same with change in floating point return type). SQL standard says that exact type division should result exact type. Double is non-compliant. Changing to int type has the same issue you're referring to. 2.2 Integer arithmetic becoming NULL in some cases First, I don't think there is any standard saying that integer operation should not emit NULL. NULL is generated when an error occurs (such as overflow, divide by zero, etc. Currently emitting NULL is one of the few error handling options a modern databases have, but is the only one that hive has, though Hive isn't consistent. I'd argue generating NULL value is worse than generating bad or wrong values in error cases. To make things worse, user is not aware of that. (Take HIVE-5660 as an example.) We may introduce different server mode to config different error handling ( HIVE-5438 ). 2.3 more than 50x performance degradation for the arithmetic operation 50x performance degradation came from a unit test, which doesn't necessary represents the Hive overall performance. Hive's performance will not be judged solely by int/int. The bigger question is: do we need some thing that's working and right, or something that's doing bad and fast. Performance can be improved down the road, but functionality deviations are hard to correct, as has been demonstrated in this discussion. Backward compatibility is a valid concern. However, the question is whether Hive is at a point where this has to be kept with any cost or we are willing to sacrifice some and achieve something that we think right. I have seen arguments from points of implementation over functionality, performance over correctness, which is, in my opinion, ill-constructed.
          Hide
          Xuefu Zhang added a comment -

          I'd argue generating NULL value is worse than generating bad or wrong values in error cases. To make things worse, user is not aware of that.

          I have to quote myself here. I meant generating NULL values is no worse than ...

          Show
          Xuefu Zhang added a comment - I'd argue generating NULL value is worse than generating bad or wrong values in error cases. To make things worse, user is not aware of that. I have to quote myself here. I meant generating NULL values is no worse than ...
          Hide
          Xuefu Zhang added a comment -

          One thing that perplexed me is that vectorization is a non-functional, implementation-level optimization, yet is closed coupled with functionality. That is, function-level changes can suddenly stop vectorization from working. This is something that's beyond my expectation, with my limited understanding.

          Of course, this is a different topic, but I thought I share my thought here as well.

          Show
          Xuefu Zhang added a comment - One thing that perplexed me is that vectorization is a non-functional, implementation-level optimization, yet is closed coupled with functionality. That is, function-level changes can suddenly stop vectorization from working. This is something that's beyond my expectation, with my limited understanding. Of course, this is a different topic, but I thought I share my thought here as well.
          Hide
          Eric Hanson added a comment -

          Vectorization has to implement a specific semantics for an operation. So if the semantics change, the vectorized implementation of the operation must be changed too, or the operation could either fail to vectorize or give wrong results.

          Show
          Eric Hanson added a comment - Vectorization has to implement a specific semantics for an operation. So if the semantics change, the vectorized implementation of the operation must be changed too, or the operation could either fail to vectorize or give wrong results.
          Hide
          Sergey Shelukhin added a comment -

          Maybe some special test can be added for this? Run a set of simple queries with and without, and ensure results are the same. That will solve the problem before commits.

          Show
          Sergey Shelukhin added a comment - Maybe some special test can be added for this? Run a set of simple queries with and without, and ensure results are the same. That will solve the problem before commits.
          Hide
          Sergey Shelukhin added a comment -

          Filed HIVE-6010

          Show
          Sergey Shelukhin added a comment - Filed HIVE-6010
          Hide
          Eric Hanson added a comment -

          HIVE-5996 has a pretty useful, in-depth, related discussion about ANSI compatibility vs. backward compatibility, Hive evolution etc. that is good context.

          Show
          Eric Hanson added a comment - HIVE-5996 has a pretty useful, in-depth, related discussion about ANSI compatibility vs. backward compatibility, Hive evolution etc. that is good context.

            People

            • Assignee:
              Xuefu Zhang
              Reporter:
              Xuefu Zhang
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development