Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-24528

Wrong implicit type conversion when comparing decimals and strings

    XMLWordPrintableJSON

Details

    Description

      In many cases when comparing decimals and strings (literals/columns) the comparison is done using doubles which can create some quite unexpected results in the answers of queries.

      create table t_str (str_col string);
      insert into t_str values ('1208925742523269458163819');
      select * from t_str where str_col=1208925742523269479013976;
      

      The SELECT query brings up one row while the filtering value is not the same with the one present in the string column of the table. The problem is that both types are converted to doubles and due to loss of precision the values are deemed equal.

      The same happens during the join of a decimal with a string type.

      create table t_dec (dec_col decimal(25,0));
      insert into t_dec values (1208925742523269479013976);
      select * from t_dec inner join t_str on dec_col=str_col;
      

      The join result contains one row although the values are not equal.

      Implicit type conversions are working differently in every DBMS and for some of them (e.g., mysql) the above behavior is normal or not allowed at all (e.g. Postgres).

      In the past, Hive used to compare decimal with string columns by converting to decimals but this behavior changed in 2.3.0 (with HIVE-13380). It seems that this behavior change was not intentional since following jiras (e.g., HIVE-18434) imply that comparison of decimals with strings should be done using decimals. Since decimal is an exact numeric it appears a more appropriate type for comparing decimals and strings.

      The goal of this issue is to change the implicit conversion of decimals with strings to doubles and use decimals instead.

      Attachments

        Issue Links

          Activity

            People

              zabetak Stamatis Zampetakis
              zabetak Stamatis Zampetakis
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 20m
                  1h 20m