Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
0.8.0
-
None
-
None
Description
Joining on columns of float and double data type produces confusing result. Drill returns the same result as postgres. Part of me feels that we should not follow postgres blindly in this case, removing implicit cast between float and double would be better choice.
At a minimum we should have a section in our documentation that discusses floating point types and this is a good example on how things can go wrong if user does not understand the behavior.
Example of such a discussion in Postgres docs: http://www.postgresql.org/docs/9.1/static/datatype-numeric.html
t1.csv
997322.0399,997322.0399 982209.1438,982209.1438 997322,997322 982209,982209 963548,963548 959310,959310
t2.csv
997322.0399,997322.0399 982209.1438,982209.1438 997322,997322 982209,982209 963548,963548 959310,959310
create table t1(c_float, c_double) as select case when columns[0] = '' then cast(null as float) else cast(columns[0] as float) end, case when columns[1] = '' then cast(null as double) else cast(columns[1] as double) end from `t1.csv`; create table t2(c_float, c_double) as select case when columns[0] = '' then cast(null as float) else cast(columns[0] as float) end, case when columns[1] = '' then cast(null as double) else cast(columns[1] as double) end from `t2.csv`; 0: jdbc:drill:schema=dfs> select * from t1; +------------+------------+ | c_float | c_double | +------------+------------+ | 997322.06 | 997322.0399 | | 982209.1 | 982209.1438 | | 997322.0 | 997322.0 | | 982209.0 | 982209.0 | | 963548.0 | 963548.0 | | 959310.0 | 959310.0 | +------------+------------+ 6 rows selected (0.05 seconds) 0: jdbc:drill:schema=dfs> select * from t2; +------------+------------+ | c_float | c_double | +------------+------------+ | 997322.06 | 997322.0399 | | 982209.1 | 982209.1438 | | 997322.0 | 997322.0 | | 982209.0 | 982209.0 | | 963548.0 | 963548.0 | | 959310.0 | 959310.0 | +------------+------------+ 6 rows selected (0.044 seconds)
Implicit cast: looks incorrect, but in fact we can't expect this to work.
0: jdbc:drill:schema=dfs> select * from t1, t2 where t1.c_float = t2.c_double; +------------+------------+------------+------------+ | c_float | c_double | c_float0 | c_double0 | +------------+------------+------------+------------+ | 959310.0 | 959310.0 | 959310.0 | 959310.0 | | 963548.0 | 963548.0 | 963548.0 | 963548.0 | | 982209.0 | 982209.0 | 982209.0 | 982209.0 | | 997322.0 | 997322.0 | 997322.0 | 997322.0 | +------------+------------+------------+------------+ 4 rows selected (0.127 seconds)
Explicit cast: same
0: jdbc:drill:schema=dfs> select * from t1, t2 where cast(t1.c_float as double) = t2.c_double; +------------+------------+------------+------------+ | c_float | c_double | c_float0 | c_double0 | +------------+------------+------------+------------+ | 959310.0 | 959310.0 | 959310.0 | 959310.0 | | 963548.0 | 963548.0 | 963548.0 | 963548.0 | | 982209.0 | 982209.0 | 982209.0 | 982209.0 | | 997322.0 | 997322.0 | 997322.0 | 997322.0 | +------------+------------+------------+------------+ 4 rows selected (0.136 seconds)