Hive
  1. Hive
  2. HIVE-3715

float and double calculation is inaccurate in Hive

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.10.0
    • Fix Version/s: None
    • Component/s: SQL
    • Labels:
      None

      Description

      I found this during debug the e2e test failures. I found Hive miss calculate the float and double value. Take float calculation as an example:

      hive> select f from all100k limit 1;
      48308.98
      hive> select f/10 from all100k limit 1;
      4830.898046875 <--added 04875 in the end
      hive> select f*1.01 from all100k limit 1;
      48792.0702734375 <--should be 48792.0698

      It might be essentially the same problem as http://effbot.org/pyfaq/why-are-floating-point-calculations-so-inaccurate.htm. But since e2e test compare the results with mysql and seems mysql does it right, so it is worthy fixing it in Hive.

        Issue Links

          Activity

          Hide
          Navis added a comment -

          HIVE-2586 might be a help for this.

          Show
          Navis added a comment - HIVE-2586 might be a help for this.
          Hide
          Johnny Zhang added a comment -
          Show
          Johnny Zhang added a comment - the reviewboard link https://reviews.apache.org/r/8653/
          Hide
          Kevin Wilfong added a comment -

          Can you do a performance comparison between your new code and the old, I've heard that BigDecimal is very inefficient compared to float and double.

          Show
          Kevin Wilfong added a comment - Can you do a performance comparison between your new code and the old, I've heard that BigDecimal is very inefficient compared to float and double.
          Hide
          Namit Jain added a comment -

          Have you looked at ArciMath BigDecimal - I have not looked at it, but casual browsing suggests it might be faster than BigDecimal.

          Show
          Namit Jain added a comment - Have you looked at ArciMath BigDecimal - I have not looked at it, but casual browsing suggests it might be faster than BigDecimal.
          Hide
          HanCheol Cho added a comment -

          Hi,

          I think I have crushed on this problem too with the following query while testing hive:
          -----------------------------------------------------------------
          explain select
          sum(l_extendedprice * l_discount) as revenue
          from
          lineitem
          where
          l_shipdate >= '1993-01-01'
          and l_shipdate < '1994-01-01'
          and l_discount between 0.05 and (0.06 + 0.01)
          and l_quantity < 25;
          -----------------------------------------------------------------
          The result is wrong and, in the explain result, l_discount's left boundery is 0.05,
          whereas its right boundary is 0.06999999999999999.

          Someone can tell me what is the status of this problem?
          I think that this is a serious problem and should be patched (and merged into
          release), but it is open for years...

          Show
          HanCheol Cho added a comment - Hi, I think I have crushed on this problem too with the following query while testing hive: ----------------------------------------------------------------- explain select sum(l_extendedprice * l_discount) as revenue from lineitem where l_shipdate >= '1993-01-01' and l_shipdate < '1994-01-01' and l_discount between 0.05 and (0.06 + 0.01) and l_quantity < 25; ----------------------------------------------------------------- The result is wrong and, in the explain result, l_discount's left boundery is 0.05, whereas its right boundary is 0.06999999999999999. Someone can tell me what is the status of this problem? I think that this is a serious problem and should be patched (and merged into release), but it is open for years...
          Hide
          HanCheol Cho added a comment -

          I have tested Hive 1.1.0-cdh5.4.5

          Show
          HanCheol Cho added a comment - I have tested Hive 1.1.0-cdh5.4.5

            People

            • Assignee:
              Johnny Zhang
              Reporter:
              Johnny Zhang
            • Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:

                Development