Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
3.1.0
Description
Issue: Running the query
select * from <table> limit n
from spark via hive warehouse connector may return more rows than "n".
This happens because "get_splits" udf creates splits ignoring the limit constraint. These splits when submitted to multiple llap daemons will return "n" rows each.
How to reproduce: Needs spark-shell, hive-warehouse-connector and hive on llap with more that 1 llap daemons running.
run below commands via beeline to create and populate the table
create table test (id int); insert into table test values (1); insert into table test values (2); insert into table test values (3); insert into table test values (4); insert into table test values (5); insert into table test values (6); insert into table test values (7); delete from test where id = 7;
now running below query via spark-shell
import com.hortonworks.hwc.HiveWarehouseSession val hive = HiveWarehouseSession.session(spark).build() hive.executeQuery("select * from test limit 1").show()
will return more than 1 rows.
Attachments
Attachments
Issue Links
- relates to
-
HIVE-23336 HIVE-23230 follow up: Fix get_llap_udf skipped unit tests
- Open
- links to