Hive
  1. Hive
  2. HIVE-5895

vectorization handles division by zero differently from normal execution

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.13.0
    • Component/s: None
    • Labels:
      None

      Description

      Produces infinity, not NULL

      1. HIVE-5895.patch
        55 kB
        Sergey Shelukhin
      2. HIVE-5895.03.patch
        67 kB
        Sergey Shelukhin
      3. HIVE-5895.02.patch
        67 kB
        Sergey Shelukhin
      4. HIVE-5895.01.patch
        66 kB
        Sergey Shelukhin

        Activity

        Hide
        Eric Hanson added a comment -

        Committed to trunk. Thanks Sergey!

        Show
        Eric Hanson added a comment - Committed to trunk. Thanks Sergey!
        Hide
        Hive QA added a comment -

        Overall: +1 all checks pass

        Here are the results of testing the latest attachment:
        https://issues.apache.org/jira/secure/attachment/12616877/HIVE-5895.03.patch

        SUCCESS: +1 4450 tests passed

        Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/503/testReport
        Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/503/console

        Messages:

        Executing org.apache.hive.ptest.execution.PrepPhase
        Executing org.apache.hive.ptest.execution.ExecutionPhase
        Executing org.apache.hive.ptest.execution.ReportingPhase
        

        This message is automatically generated.

        ATTACHMENT ID: 12616877

        Show
        Hive QA added a comment - Overall : +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12616877/HIVE-5895.03.patch SUCCESS: +1 4450 tests passed Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/503/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/503/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase This message is automatically generated. ATTACHMENT ID: 12616877
        Hide
        Eric Hanson added a comment -

        +1

        Show
        Eric Hanson added a comment - +1
        Hide
        Hive QA added a comment -

        Overall: +1 all checks pass

        Here are the results of testing the latest attachment:
        https://issues.apache.org/jira/secure/attachment/12616854/HIVE-5895.02.patch

        SUCCESS: +1 4450 tests passed

        Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/502/testReport
        Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/502/console

        Messages:

        Executing org.apache.hive.ptest.execution.PrepPhase
        Executing org.apache.hive.ptest.execution.ExecutionPhase
        Executing org.apache.hive.ptest.execution.ReportingPhase
        

        This message is automatically generated.

        ATTACHMENT ID: 12616854

        Show
        Hive QA added a comment - Overall : +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12616854/HIVE-5895.02.patch SUCCESS: +1 4450 tests passed Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/502/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/502/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase This message is automatically generated. ATTACHMENT ID: 12616854
        Hide
        Eric Hanson added a comment -

        E.g. you could have a vectorized row batch with 1024 entries, and only items (rows) 0 and 2 are selected. Then if you do an operation like col/0, the result column will be repeating and null. But only items 0 and 2 will be selected. The selected array tells you which rows in the batch are selected.

        Show
        Eric Hanson added a comment - E.g. you could have a vectorized row batch with 1024 entries, and only items (rows) 0 and 2 are selected. Then if you do an operation like col/0, the result column will be repeating and null. But only items 0 and 2 will be selected. The selected array tells you which rows in the batch are selected.
        Hide
        Eric Hanson added a comment -

        Yes.

        Show
        Eric Hanson added a comment - Yes.
        Hide
        Sergey Shelukhin added a comment -

        Are you sure selectedInUse and isRepeating are compatible? I'd have to check quite a bit of code to confirm

        Show
        Sergey Shelukhin added a comment - Are you sure selectedInUse and isRepeating are compatible? I'd have to check quite a bit of code to confirm
        Hide
        Eric Hanson added a comment -

        See my comments on RB

        Show
        Eric Hanson added a comment - See my comments on RB
        Hide
        Sergey Shelukhin added a comment -

        fixed unit test; enforced ordering in the q test, appears to be different on hadoop1 and 2 otherwise

        Show
        Sergey Shelukhin added a comment - fixed unit test; enforced ordering in the q test, appears to be different on hadoop1 and 2 otherwise
        Hide
        Sergey Shelukhin added a comment -

        hmm, seemed to have passed on my box yesterday... let me update

        Show
        Sergey Shelukhin added a comment - hmm, seemed to have passed on my box yesterday... let me update
        Hide
        Hive QA added a comment -

        Overall: -1 at least one tests failed

        Here are the results of testing the latest attachment:
        https://issues.apache.org/jira/secure/attachment/12616674/HIVE-5895.01.patch

        ERROR: -1 due to 2 failed/errored test(s), 4450 tests executed
        Failed tests:

        org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_div0
        org.apache.hadoop.hive.ql.exec.vector.expressions.TestVectorScalarColArithmetic.testScalarLongDivide
        

        Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/493/testReport
        Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/493/console

        Messages:

        Executing org.apache.hive.ptest.execution.PrepPhase
        Executing org.apache.hive.ptest.execution.ExecutionPhase
        Executing org.apache.hive.ptest.execution.ReportingPhase
        Tests exited with: TestsFailedException: 2 tests failed
        

        This message is automatically generated.

        ATTACHMENT ID: 12616674

        Show
        Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12616674/HIVE-5895.01.patch ERROR: -1 due to 2 failed/errored test(s), 4450 tests executed Failed tests: org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_div0 org.apache.hadoop.hive.ql.exec.vector.expressions.TestVectorScalarColArithmetic.testScalarLongDivide Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/493/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/493/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed This message is automatically generated. ATTACHMENT ID: 12616674
        Hide
        Sergey Shelukhin added a comment -

        so far, limited the test for cases where deletion actually vectorizes

        Show
        Sergey Shelukhin added a comment - so far, limited the test for cases where deletion actually vectorizes
        Hide
        Sergey Shelukhin added a comment -

        writing the test as it is now depends on HIVE-5918 and HIVE-5919, the patch itself doesn't

        Show
        Sergey Shelukhin added a comment - writing the test as it is now depends on HIVE-5918 and HIVE-5919 , the patch itself doesn't
        Hide
        Eric Hanson added a comment -

        See my new comments on ReviewBoard

        Show
        Eric Hanson added a comment - See my new comments on ReviewBoard
        Hide
        Hive QA added a comment -

        Overall: -1 at least one tests failed

        Here are the results of testing the latest attachment:
        https://issues.apache.org/jira/secure/attachment/12615978/HIVE-5895.patch

        ERROR: -1 due to 2 failed/errored test(s), 4741 tests executed
        Failed tests:

        org.apache.hadoop.hive.ql.exec.vector.expressions.TestVectorArithmeticExpressions.testLongColDivideLongColumn
        org.apache.hadoop.hive.ql.exec.vector.expressions.TestVectorScalarColArithmetic.testScalarLongDivide
        

        Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/470/testReport
        Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/470/console

        Messages:

        Executing org.apache.hive.ptest.execution.PrepPhase
        Executing org.apache.hive.ptest.execution.ExecutionPhase
        Executing org.apache.hive.ptest.execution.ReportingPhase
        Tests exited with: TestsFailedException: 2 tests failed
        

        This message is automatically generated.

        ATTACHMENT ID: 12615978

        Show
        Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12615978/HIVE-5895.patch ERROR: -1 due to 2 failed/errored test(s), 4741 tests executed Failed tests: org.apache.hadoop.hive.ql.exec.vector.expressions.TestVectorArithmeticExpressions.testLongColDivideLongColumn org.apache.hadoop.hive.ql.exec.vector.expressions.TestVectorScalarColArithmetic.testScalarLongDivide Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/470/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/470/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed This message is automatically generated. ATTACHMENT ID: 12615978
        Hide
        Eric Hanson added a comment -

        This is looking good overall. Please see my review comments.

        Show
        Eric Hanson added a comment - This is looking good overall. Please see my review comments.
        Hide
        Sergey Shelukhin added a comment -
        Show
        Sergey Shelukhin added a comment - rb at https://reviews.apache.org/r/15869/
        Hide
        Sergey Shelukhin added a comment -

        Xuefu Zhang infinity can be returned from valid double division if the value is too large to be represented iirc. So it would be a valid value

        Show
        Sergey Shelukhin added a comment - Xuefu Zhang infinity can be returned from valid double division if the value is too large to be represented iirc. So it would be a valid value
        Hide
        Sergey Shelukhin added a comment -

        here's the patch with test

        Show
        Sergey Shelukhin added a comment - here's the patch with test
        Hide
        Xuefu Zhang added a comment -

        I understand that the current division can possibly return infinity or NaN. However, this seems to me a bug (or bad behavior) rather than norm. To me, they provide no more information than NULL. Such complication comes due to hive having no consistent error handling. HIVE-5348 and HIVE-5660 are to address those issues. For this particular case, I wouldn't be crying if NULL is returned for both infinity or NaN, especially if it makes the life easier.

        Show
        Xuefu Zhang added a comment - I understand that the current division can possibly return infinity or NaN. However, this seems to me a bug (or bad behavior) rather than norm. To me, they provide no more information than NULL. Such complication comes due to hive having no consistent error handling. HIVE-5348 and HIVE-5660 are to address those issues. For this particular case, I wouldn't be crying if NULL is returned for both infinity or NaN, especially if it makes the life easier.
        Hide
        Sergey Shelukhin added a comment -

        Patch later today...

        Show
        Sergey Shelukhin added a comment - Patch later today...
        Hide
        Sergey Shelukhin added a comment -

        Actually regular division can also produce infinities according to the spec, so in order to preserve the logic that avoids the null-value-setting loop when batch has no nulls, one would have to also pass in input to do the if for entire batch anyway if there's at least one 0 on the right side. So there will be a lot of code change for this case.

        Show
        Sergey Shelukhin added a comment - Actually regular division can also produce infinities according to the spec, so in order to preserve the logic that avoids the null-value-setting loop when batch has no nulls, one would have to also pass in input to do the if for entire batch anyway if there's at least one 0 on the right side. So there will be a lot of code change for this case.
        Hide
        Eric Hanson added a comment -

        Sounds good. It would be best to make it conditional in the template expansion or a once-per-vector check so extra work for each vector element is only done for the divide operator.

        Show
        Eric Hanson added a comment - Sounds good. It would be best to make it conditional in the template expansion or a once-per-vector check so extra work for each vector element is only done for the divide operator.
        Hide
        Sergey Shelukhin added a comment -

        probably it can piggyback on the pass already done by NullUtil

        Show
        Sergey Shelukhin added a comment - probably it can piggyback on the pass already done by NullUtil
        Hide
        Eric Hanson added a comment -

        On second thought, the "counting" would require a conditional anyway. So just a final pass to conditionally convert infinity to NULL for division should be fine.

        Show
        Eric Hanson added a comment - On second thought, the "counting" would require a conditional anyway. So just a final pass to conditionally convert infinity to NULL for division should be fine.
        Hide
        Sergey Shelukhin added a comment -

        Yeah, that's what I'm doing

        Show
        Sergey Shelukhin added a comment - Yeah, that's what I'm doing
        Hide
        Eric Hanson added a comment -

        For division only, consider doing a final pass over the vector and count the infinity values. If it is 0, do nothing. This is the common case.

        If it is non-zero, then do another pass and conditionally convert the infinities to NULL.

        This avoids an if/else check in the inner loop for the common case.

        Eric

        Show
        Eric Hanson added a comment - For division only, consider doing a final pass over the vector and count the infinity values. If it is 0, do nothing. This is the common case. If it is non-zero, then do another pass and conditionally convert the infinities to NULL. This avoids an if/else check in the inner loop for the common case. Eric

          People

          • Assignee:
            Sergey Shelukhin
            Reporter:
            Sergey Shelukhin
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development