Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
3.1.0
-
None
-
None
Description
Consider this example.
Preparation:
create temporary table opens as ( select stack(1, 1 , cast('2019-11-13 08:07:28' as timestamp) ) as (id , load_ts ) );
Queries: This is just about counting the number of rows, with filters always matching, and possibly sort by. 1 is always expected.
select count(*) from ( select * from opens) t; select count(*) from ( select * from opens sort by id) t; select count(*) from ( select * from opens where load_ts >= '2019-11-13 08:07:00' ) t; select count(*) from ( select * from opens where load_ts >= '2019-11-13 08:07:00' sort by id) t; select count(*) from ( select * from opens where load_ts <= '2019-11-13 09:07:00') t; select count(*) from ( select * from opens where load_ts <= '2019-11-13 09:07:00' sort by id) t;
The last query (sort by and <= on timestamp) returns 0 rows.
I believe that this is the cause of other issues I have, where I have missing rows in queries with the timestamp (but not the explicit sort by).
Note that if instead of a temporary table I use a CTE for opens, the issue does not appear.
I tried workarounds (inverse order of operands, adding not or not not ) to no avail.
One thing that did work is to explicitly cast the string to a timestamp:
select count(*) from ( select * from opens where load_ts <= cast('2019-11-13 09:07:00' as timestamp) sort by id) t;
It might be good practice indeed, but there still is a discrepancy between how >= and <= are handled, or how sort by works.
Note: this is on Hive from [hdp3.1.4|https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.4/release-notes/content/patch_hive.html] , without llap.