Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
all
Description
Kylin limit-pushdown is sometimes cause data reduction.
For example:
select uid, sum(active_minutes) as am
from useraction
where item_id in (
select distinct item_id
from iteminfo
where item_type in ('Video')
) and act_type != 'share'
group by uid
limit 10
In hive, we got correct result(Five row).
hive>
> select uid, sum(active_minutes) as am
> from useraction
> where item_id in (
> select distinct item_id
> from iteminfo
> where item_type in ('Video')
> ) and act_type != 'share'
> group by uid
> limit 10;
Query ID = root_20181216170145_d5667a81-46d0-4899-a4bb-7c580155049e
Total jobs = 1
Launching Job 1 out of 1Status: Running (Executing on YARN cluster with App id application_1539833412107_0414)
--------------------------------------------------------------------------------
VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
--------------------------------------------------------------------------------
Map 1 .......... SUCCEEDED 1 1 0 0 0 0
Map 3 .......... SUCCEEDED 1 1 0 0 0 0
Reducer 2 ...... SUCCEEDED 1 1 0 0 0 0
Reducer 4 ...... SUCCEEDED 1 1 0 0 0 0
--------------------------------------------------------------------------------
VERTICES: 04/04 [==========================>>] 100% ELAPSED TIME: 7.67 s
--------------------------------------------------------------------------------
OK
1 14565.470000000008
2 64744.89000000003
3 64939.01999999984
5 36563.76999999997
6 36641.64999999999
Time taken: 11.02 seconds, Fetched: 5 row(s)
In Kylin, same query got error result(only THREE row). But when you set limit to 50000(original value). It is OK.
We can find following things in log:
KYLIN [ DEBUG ] 12-16 17:04:28.299 org.apache.kylin.storage.gtrecord.GTCubeStorageQueryBase.enableStorageLimitIfPossible(GTCubeStorageQueryBase.java:433) from Query 78808744-8324-3ad4-58ac-93ad7cd8a708-81
> storageLimitLevel set to LIMIT_ON_RETURN_SIZE because groupD is not clustered at head, groupsD: {0} with cuboid columns: {1}KYLIN [ INFO ] 12-16 17:04:28.299 org.apache.kylin.storage.StorageContext.applyLimitPushDown(StorageContext.java:167) from Query 78808744-8324-3ad4-58ac-93ad7cd8a708-81
> Enabling limit push down: 10 at level: LIMIT_ON_RETURN_SIZE
KYLIN [ INFO ] 12-16 17:04:28.405 org.apache.kylin.rest.service.QueryService.logQuery(QueryService.java:352) from Query 78808744-8324-3ad4-58ac-93ad7cd8a708-81
>
==========================[QUERY]===============================
Query Id: 78808744-8324-3ad4-58ac-93ad7cd8a708
SQL: select uid, sum(active_minutes) as am
from useraction
where item_id in (
select distinct item_id
from iteminfo
where item_type in ('Video')
) and act_type != 'share'
group by uid
User: ADMIN
Success: true
Duration: 0.202
Project: PearVideo
Realization Names: [CUBE[name=PearVideoCube1], CUBE[name=PearVideoCube1]]
Cuboid Ids: [14]
Total scan count: 120
Total scan bytes: 6442
Result row count: 3
Accept Partial: true
Is Partial Result: false
Hit Exception Cache: false
Storage cache used: false
Is Query Push-Down: false
Is Prepare: false
Trace URL: null
Message: null
==========================[QUERY]===============================
Execution Plan
EXECUTION PLAN BEFORE REWRITE
OLAPToEnumerableConverter
OLAPLimitRel(ctx=[], fetch=[1])
OLAPAggregateRel(group=[\{0}], AM=[SUM($1)], ctx=[])
OLAPProjectRel(UID=[$0], ACTIVE_MINUTES=[$4], ctx=[])
OLAPFilterRel(condition=[<>($1, 'share')], ctx=[])
OLAPJoinRel(condition=[=($2, $9)], joinType=[inner], ctx=[])
OLAPTableScan(table=[[DEMO_USER_ACT, USERACTION]], ctx=[], fields=[[0, 1, 2, 3, 4, 5, 6, 7, 8]])
OLAPAggregateRel(group=[\{0}], ctx=[])
OLAPProjectRel(ITEM_ID=[$0], ctx=[])
OLAPFilterRel(condition=[=($1, 'Video')], ctx=[])
OLAPTableScan(table=[[DEMO_USER_ACT, ITEMINFO]], ctx=[], fields=[[0, 1]])EXECUTION PLAN AFTER OLAPCONTEXT IS SET
OLAPToEnumerableConverter
OLAPLimitRel(ctx=[0@null], fetch=[1])
OLAPAggregateRel(group=[\{0}], AM=[SUM($1)], ctx=[0@null])
OLAPProjectRel(UID=[$0], ACTIVE_MINUTES=[$4], ctx=[0@null])
OLAPFilterRel(condition=[<>($1, 'share')], ctx=[0@null])
OLAPJoinRel(condition=[=($2, $9)], joinType=[inner], ctx=[0@null])
OLAPTableScan(table=[[DEMO_USER_ACT, USERACTION]], ctx=[0@null], fields=[[0, 1, 2, 3, 4, 5, 6, 7, 8]])
OLAPAggregateRel(group=[\{0}], ctx=[1@null])
OLAPProjectRel(ITEM_ID=[$0], ctx=[1@null])
OLAPFilterRel(condition=[=($1, 'Video')], ctx=[1@null])
OLAPTableScan(table=[[DEMO_USER_ACT, ITEMINFO]], ctx=[1@null], fields=[[0, 1]])
This bug was reported by Meituan's Dev kangkaisen.
Attachments
Attachments
Issue Links
- relates to
-
KYLIN-2841 LIMIT pushdown should be applied to subquery
- Closed
- links to