Uploaded image for project: 'Kylin'
  1. Kylin
  2. KYLIN-3722

Error Limit Push Down in Join Related Query

    XMLWordPrintableJSON

Details

    Description

      Kylin limit-pushdown is sometimes cause data reduction.

      For example:

      select uid, sum(active_minutes) as am
      from useraction
      where item_id in (
          select distinct item_id
          from iteminfo
          where item_type in ('Video')
      ) and act_type != 'share'
      group by uid
      limit 10

      In hive, we got correct result(Five row).

      hive>
      > select uid, sum(active_minutes) as am
      > from useraction
      > where item_id in (
      > select distinct item_id
      > from iteminfo
      > where item_type in ('Video')
      > ) and act_type != 'share'
      > group by uid
      > limit 10;
      Query ID = root_20181216170145_d5667a81-46d0-4899-a4bb-7c580155049e
      Total jobs = 1
      Launching Job 1 out of 1

      Status: Running (Executing on YARN cluster with App id application_1539833412107_0414)

      --------------------------------------------------------------------------------
      VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
      --------------------------------------------------------------------------------
      Map 1 .......... SUCCEEDED 1 1 0 0 0 0
      Map 3 .......... SUCCEEDED 1 1 0 0 0 0
      Reducer 2 ...... SUCCEEDED 1 1 0 0 0 0
      Reducer 4 ...... SUCCEEDED 1 1 0 0 0 0
      --------------------------------------------------------------------------------
      VERTICES: 04/04 [==========================>>] 100% ELAPSED TIME: 7.67 s
      --------------------------------------------------------------------------------
      OK
      1 14565.470000000008
      2 64744.89000000003
      3 64939.01999999984
      5 36563.76999999997
      6 36641.64999999999
      Time taken: 11.02 seconds, Fetched: 5 row(s)

      In Kylin, same query got error result(only THREE row). But when you set limit to 50000(original value). It is OK.

       

       

      We can find following things in log:

       

      KYLIN [ DEBUG ] 12-16 17:04:28.299 org.apache.kylin.storage.gtrecord.GTCubeStorageQueryBase.enableStorageLimitIfPossible(GTCubeStorageQueryBase.java:433) from Query 78808744-8324-3ad4-58ac-93ad7cd8a708-81
      > storageLimitLevel set to LIMIT_ON_RETURN_SIZE because groupD is not clustered at head, groupsD: {0} with cuboid columns: {1}

      KYLIN [ INFO ] 12-16 17:04:28.299 org.apache.kylin.storage.StorageContext.applyLimitPushDown(StorageContext.java:167) from Query 78808744-8324-3ad4-58ac-93ad7cd8a708-81
      > Enabling limit push down: 10 at level: LIMIT_ON_RETURN_SIZE

       

      KYLIN [ INFO ] 12-16 17:04:28.405 org.apache.kylin.rest.service.QueryService.logQuery(QueryService.java:352) from Query 78808744-8324-3ad4-58ac-93ad7cd8a708-81
      >
      ==========================[QUERY]===============================
      Query Id: 78808744-8324-3ad4-58ac-93ad7cd8a708
      SQL: select uid, sum(active_minutes) as am
      from useraction
      where item_id in (
      select distinct item_id
      from iteminfo
      where item_type in ('Video')
      ) and act_type != 'share'
      group by uid
      User: ADMIN
      Success: true
      Duration: 0.202
      Project: PearVideo
      Realization Names: [CUBE[name=PearVideoCube1], CUBE[name=PearVideoCube1]]
      Cuboid Ids: [14]
      Total scan count: 120
      Total scan bytes: 6442
      Result row count: 3
      Accept Partial: true
      Is Partial Result: false
      Hit Exception Cache: false
      Storage cache used: false
      Is Query Push-Down: false
      Is Prepare: false
      Trace URL: null
      Message: null
      ==========================[QUERY]=============================== 

       

      Execution Plan 

      EXECUTION PLAN BEFORE REWRITE
      OLAPToEnumerableConverter
      OLAPLimitRel(ctx=[], fetch=[1])
      OLAPAggregateRel(group=[\{0}], AM=[SUM($1)], ctx=[])
      OLAPProjectRel(UID=[$0], ACTIVE_MINUTES=[$4], ctx=[])
      OLAPFilterRel(condition=[<>($1, 'share')], ctx=[])
      OLAPJoinRel(condition=[=($2, $9)], joinType=[inner], ctx=[])
      OLAPTableScan(table=[[DEMO_USER_ACT, USERACTION]], ctx=[], fields=[[0, 1, 2, 3, 4, 5, 6, 7, 8]])
      OLAPAggregateRel(group=[\{0}], ctx=[])
      OLAPProjectRel(ITEM_ID=[$0], ctx=[])
      OLAPFilterRel(condition=[=($1, 'Video')], ctx=[])
      OLAPTableScan(table=[[DEMO_USER_ACT, ITEMINFO]], ctx=[], fields=[[0, 1]])

      EXECUTION PLAN AFTER OLAPCONTEXT IS SET
      OLAPToEnumerableConverter
      OLAPLimitRel(ctx=[0@null], fetch=[1])
      OLAPAggregateRel(group=[\{0}], AM=[SUM($1)], ctx=[0@null])
      OLAPProjectRel(UID=[$0], ACTIVE_MINUTES=[$4], ctx=[0@null])
      OLAPFilterRel(condition=[<>($1, 'share')], ctx=[0@null])
      OLAPJoinRel(condition=[=($2, $9)], joinType=[inner], ctx=[0@null])
      OLAPTableScan(table=[[DEMO_USER_ACT, USERACTION]], ctx=[0@null], fields=[[0, 1, 2, 3, 4, 5, 6, 7, 8]])
      OLAPAggregateRel(group=[\{0}], ctx=[1@null])
      OLAPProjectRel(ITEM_ID=[$0], ctx=[1@null])
      OLAPFilterRel(condition=[=($1, 'Video')], ctx=[1@null])
      OLAPTableScan(table=[[DEMO_USER_ACT, ITEMINFO]], ctx=[1@null], fields=[[0, 1]])

       

      This bug was reported by Meituan's Dev kangkaisen.

      Attachments

        1. image-2018-12-16-22-58-04-486.png
          83 kB
          Xiaoxiang Yu
        2. image-2018-12-16-22-56-50-607.png
          367 kB
          Xiaoxiang Yu
        3. image-2018-12-16-22-56-27-258.png
          420 kB
          Xiaoxiang Yu
        4. image-2018-12-16-22-55-23-827.png
          418 kB
          Xiaoxiang Yu
        5. image-2018-12-16-22-54-25-718.png
          418 kB
          Xiaoxiang Yu
        6. image-2018-12-16-22-48-25-622.png
          35 kB
          Xiaoxiang Yu
        7. image-2018-12-16-22-46-43-168.png
          6 kB
          Xiaoxiang Yu
        8. image-2018-12-16-22-45-29-841.png
          16 kB
          Xiaoxiang Yu
        9. image-2018-12-16-22-44-32-875.png
          127 kB
          Xiaoxiang Yu
        10. image-2018-12-16-22-40-36-857.png
          19 kB
          Xiaoxiang Yu
        11. image-2018-12-16-17-38-13-454.png
          89 kB
          Xiaoxiang Yu
        12. image-2018-12-16-17-24-21-017.png
          96 kB
          Xiaoxiang Yu
        13. image-2018-12-16-17-06-16-341.png
          113 kB
          Xiaoxiang Yu
        14. 0001-storageLimitLevel-set-to-NO_LIMIT-when-the-query-has.patch
          3 kB
          yuanhanhan

        Issue Links

          Activity

            People

              hit_lacus Xiaoxiang Yu
              hit_lacus Xiaoxiang Yu
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 24h
                  24h
                  Remaining:
                  Remaining Estimate - 24h
                  24h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified