Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-3715

float and double calculation is inaccurate in Hive

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 0.10.0
    • Fix Version/s: None
    • Component/s: SQL
    • Labels:
      None

      Description

      I found this during debug the e2e test failures. I found Hive miss calculate the float and double value. Take float calculation as an example:

      hive> select f from all100k limit 1;
      48308.98
      hive> select f/10 from all100k limit 1;
      4830.898046875 <--added 04875 in the end
      hive> select f*1.01 from all100k limit 1;
      48792.0702734375 <--should be 48792.0698

      It might be essentially the same problem as http://effbot.org/pyfaq/why-are-floating-point-calculations-so-inaccurate.htm. But since e2e test compare the results with mysql and seems mysql does it right, so it is worthy fixing it in Hive.

        Issue Links

          Activity

          Hide
          priancho HanCheol Cho added a comment -

          I have tested Hive 1.1.0-cdh5.4.5

          Show
          priancho HanCheol Cho added a comment - I have tested Hive 1.1.0-cdh5.4.5
          Hide
          priancho HanCheol Cho added a comment -

          Hi,

          I think I have crushed on this problem too with the following query while testing hive:
          -----------------------------------------------------------------
          explain select
          sum(l_extendedprice * l_discount) as revenue
          from
          lineitem
          where
          l_shipdate >= '1993-01-01'
          and l_shipdate < '1994-01-01'
          and l_discount between 0.05 and (0.06 + 0.01)
          and l_quantity < 25;
          -----------------------------------------------------------------
          The result is wrong and, in the explain result, l_discount's left boundery is 0.05,
          whereas its right boundary is 0.06999999999999999.

          Someone can tell me what is the status of this problem?
          I think that this is a serious problem and should be patched (and merged into
          release), but it is open for years...

          Show
          priancho HanCheol Cho added a comment - Hi, I think I have crushed on this problem too with the following query while testing hive: ----------------------------------------------------------------- explain select sum(l_extendedprice * l_discount) as revenue from lineitem where l_shipdate >= '1993-01-01' and l_shipdate < '1994-01-01' and l_discount between 0.05 and (0.06 + 0.01) and l_quantity < 25; ----------------------------------------------------------------- The result is wrong and, in the explain result, l_discount's left boundery is 0.05, whereas its right boundary is 0.06999999999999999. Someone can tell me what is the status of this problem? I think that this is a serious problem and should be patched (and merged into release), but it is open for years...
          Hide
          namit Namit Jain added a comment -

          Have you looked at ArciMath BigDecimal - I have not looked at it, but casual browsing suggests it might be faster than BigDecimal.

          Show
          namit Namit Jain added a comment - Have you looked at ArciMath BigDecimal - I have not looked at it, but casual browsing suggests it might be faster than BigDecimal.
          Hide
          kevinwilfong Kevin Wilfong added a comment -

          Can you do a performance comparison between your new code and the old, I've heard that BigDecimal is very inefficient compared to float and double.

          Show
          kevinwilfong Kevin Wilfong added a comment - Can you do a performance comparison between your new code and the old, I've heard that BigDecimal is very inefficient compared to float and double.
          Hide
          dreambird Johnny Zhang added a comment -
          Show
          dreambird Johnny Zhang added a comment - the reviewboard link https://reviews.apache.org/r/8653/
          Hide
          navis Navis added a comment -

          HIVE-2586 might be a help for this.

          Show
          navis Navis added a comment - HIVE-2586 might be a help for this.

            People

            • Assignee:
              dreambird Johnny Zhang
              Reporter:
              dreambird Johnny Zhang
            • Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:

                Development