Hive
  1. Hive
  2. HIVE-6508

Mismatched results between vector and non-vector mode with decimal field

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.13.0
    • Fix Version/s: 0.13.0, 0.14.0
    • Component/s: Query Processor
    • Labels:
      None

      Description

      Following query has a little mismatch in result as compared to the non-vector mode.

      select d_year, i_brand_id, i_brand,
             sum(ss_ext_sales_price) as sum_agg
      from date_dim
      join store_sales on date_dim.d_date_sk = store_sales.ss_sold_date_sk
      join item on store_sales.ss_item_sk = item.i_item_sk
      where i_manufact_id = 128
        and d_moy = 11
      group by d_year, i_brand, i_brand_id
      order by d_year, sum_agg desc, i_brand_id
      limit 100;
      

      This query is on tpcds data.
      The field ss_ext_sales_price is of type decimal(7,2) and everything else is an integer.

      1. HIVE-6508.1.patch
        41 kB
        Remus Rusanu
      2. HIVE-6508.1.patch
        41 kB
        Jitendra Nath Pandey

        Activity

        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Patch Available Patch Available Open Open
        14h 2m 1 Jitendra Nath Pandey 07/Mar/14 02:32
        Open Open Patch Available Patch Available
        8d 5h 1m 2 Jitendra Nath Pandey 07/Mar/14 02:33
        Patch Available Patch Available Resolved Resolved
        1d 20h 31m 1 Jitendra Nath Pandey 08/Mar/14 23:05
        Resolved Resolved Closed Closed
        249d 20h 36m 1 Thejas M Nair 13/Nov/14 19:42
        Thejas M Nair made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Hide
        Thejas M Nair added a comment -

        This has been fixed in 0.14 release. Please open new jira if you see any issues.

        Show
        Thejas M Nair added a comment - This has been fixed in 0.14 release. Please open new jira if you see any issues.
        Hide
        Jitendra Nath Pandey added a comment -

        Committed to branch-0.13

        Show
        Jitendra Nath Pandey added a comment - Committed to branch-0.13
        Jitendra Nath Pandey made changes -
        Fix Version/s 0.13.0 [ 12324986 ]
        Fix Version/s 0.14.0 [ 12326450 ]
        Jitendra Nath Pandey made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Hide
        Jitendra Nath Pandey added a comment -

        Committed to trunk. It is a correctness bug, therefore I will port it to hive-13 branch as well.

        Show
        Jitendra Nath Pandey added a comment - Committed to trunk. It is a correctness bug, therefore I will port it to hive-13 branch as well.
        Hide
        Remus Rusanu added a comment -

        The failure is unrelated to the patch

        Show
        Remus Rusanu added a comment - The failure is unrelated to the patch
        Hide
        Hive QA added a comment -

        Overall: -1 at least one tests failed

        Here are the results of testing the latest attachment:
        https://issues.apache.org/jira/secure/attachment/12633294/HIVE-6508.1.patch

        ERROR: -1 due to 1 failed/errored test(s), 5374 tests executed
        Failed tests:

        org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_parallel_orderby
        

        Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1660/testReport
        Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1660/console

        Messages:

        Executing org.apache.hive.ptest.execution.PrepPhase
        Executing org.apache.hive.ptest.execution.ExecutionPhase
        Executing org.apache.hive.ptest.execution.ReportingPhase
        Tests exited with: TestsFailedException: 1 tests failed
        

        This message is automatically generated.

        ATTACHMENT ID: 12633294

        Show
        Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12633294/HIVE-6508.1.patch ERROR: -1 due to 1 failed/errored test(s), 5374 tests executed Failed tests: org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_parallel_orderby Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1660/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1660/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed This message is automatically generated. ATTACHMENT ID: 12633294
        Hide
        Remus Rusanu added a comment -

        Sergey Shelukhin There is a new test case testSumDecimalHive6508 the covers possible regression

        Show
        Remus Rusanu added a comment - Sergey Shelukhin There is a new test case testSumDecimalHive6508 the covers possible regression
        Hide
        Sergey Shelukhin added a comment -

        It might make sense to add a test to ./ql/src/test/queries/clientcompare

        Show
        Sergey Shelukhin added a comment - It might make sense to add a test to ./ql/src/test/queries/clientcompare
        Jitendra Nath Pandey made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Jitendra Nath Pandey made changes -
        Attachment HIVE-6508.1.patch [ 12633294 ]
        Hide
        Jitendra Nath Pandey added a comment -

        Same patch re-uploaded to trigger pre-commit build.

        Show
        Jitendra Nath Pandey added a comment - Same patch re-uploaded to trigger pre-commit build.
        Jitendra Nath Pandey made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Hide
        Jitendra Nath Pandey added a comment -

        +1. The patch looks good to me.

        Show
        Jitendra Nath Pandey added a comment - +1. The patch looks good to me.
        Remus Rusanu made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Remus Rusanu made changes -
        Attachment HIVE-6508.1.patch [ 12633112 ]
        Hide
        Remus Rusanu added a comment -

        The value 0 comes in the input vector unscaled (scale 0). As aggregates (SUM, STDxx) are being updated, they us the scale of the input value, not the scale of the input column. So any 0 in the input will round the intermediate fractional part of the intermediate. Final result is off. AVG uses a special scale so is not affected. MIN/MAX use the input value scale, but has no side effects. Fix is to pass in the column scale explictly, rather than assume the input value scale has the column scale. Ultimately the behavior of passing in unscaled 0s is wrong, but this comes from the row-mode join modus-operandi and I don't want to change that. Hardening the aggregates against this case is more robust.

        Show
        Remus Rusanu added a comment - The value 0 comes in the input vector unscaled (scale 0). As aggregates (SUM, STDxx) are being updated, they us the scale of the input value, not the scale of the input column. So any 0 in the input will round the intermediate fractional part of the intermediate. Final result is off. AVG uses a special scale so is not affected. MIN/MAX use the input value scale, but has no side effects. Fix is to pass in the column scale explictly, rather than assume the input value scale has the column scale. Ultimately the behavior of passing in unscaled 0s is wrong, but this comes from the row-mode join modus-operandi and I don't want to change that. Hardening the aggregates against this case is more robust.
        Remus Rusanu made changes -
        Field Original Value New Value
        Description Following query has a little mismatch in result as compared to the non-vector mode.
        {code}
        select d_year, i_brand_id, i_brand,
               sum(ss_ext_sales_price) as sum_agg
        from date_dim
        join store_sales on date_dim.d_date_sk = store_sales.ss_sold_date_sk
        join item on store_sales.ss_item_sk = item.i_item_sk
        where i_manufact_id = 128
          and d_moy = 11
        group by d_year, i_brand, i_brand_id
        order by d_year, sum_agg desc, i_brand_id
        limit 100;
        {code}
        This query is on tpcds data.
        The field
        ss_ext_sales_price is of type decimal(7,2) and everything else
        is an integer.
        Following query has a little mismatch in result as compared to the non-vector mode.
        {code}
        select d_year, i_brand_id, i_brand,
               sum(ss_ext_sales_price) as sum_agg
        from date_dim
        join store_sales on date_dim.d_date_sk = store_sales.ss_sold_date_sk
        join item on store_sales.ss_item_sk = item.i_item_sk
        where i_manufact_id = 128
          and d_moy = 11
        group by d_year, i_brand, i_brand_id
        order by d_year, sum_agg desc, i_brand_id
        limit 100;
        {code}
        This query is on tpcds data.
        The field ss_ext_sales_price is of type decimal(7,2) and everything else is an integer.
        Remus Rusanu created issue -

          People

          • Assignee:
            Remus Rusanu
            Reporter:
            Remus Rusanu
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development