[HIVE-23230] "get_splits" UDF ignores limit clause while creating splits. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 3.1.0
Fix Version/s: 4.0.0-alpha-1
Component/s: HiveServer2
Labels:
- UDF

Target Version/s:

4.0.0

Description

Issue: Running the query

select * from <table> limit n

from spark via hive warehouse connector may return more rows than "n".

This happens because "get_splits" udf creates splits ignoring the limit constraint. These splits when submitted to multiple llap daemons will return "n" rows each.

How to reproduce: Needs spark-shell, hive-warehouse-connector and hive on llap with more that 1 llap daemons running.

run below commands via beeline to create and populate the table

create table test (id int);
insert into table test values (1);
insert into table test values (2);
insert into table test values (3);
insert into table test values (4);
insert into table test values (5);
insert into table test values (6);
insert into table test values (7);
delete from test where id = 7;

now running below query via spark-shell

import com.hortonworks.hwc.HiveWarehouseSession 
val hive = HiveWarehouseSession.session(spark).build() 
hive.executeQuery("select * from test limit 1").show()

will return more than 1 rows.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HIVE-23230.4.patch
25/Apr/20 03:26
14 kB
Adesh Kumar Rao
HIVE-23230.3.patch
24/Apr/20 06:53
14 kB
Adesh Kumar Rao
HIVE-23230.2.patch
21/Apr/20 09:30
8 kB
Adesh Kumar Rao
HIVE-23230.1.patch
20/Apr/20 17:51
5 kB
Adesh Kumar Rao
HIVE-23230.patch
17/Apr/20 11:53
6 kB
Adesh Kumar Rao

Issue Links

relates to

HIVE-23336 HIVE-23230 follow up: Fix get_llap_udf skipped unit tests

Open

links to

review-request

Activity

People

Assignee:: Adesh Kumar Rao

Reporter:: Adesh Kumar Rao

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 17/Apr/20 05:49

Updated:: 17/Nov/22 08:50

Resolved:: 30/Apr/20 11:03