Uploaded image for project: 'Kylin'
  1. Kylin
  2. KYLIN-4260

When using server side PreparedStatement cache, the query result are not match on TopN scenario

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: In Progress
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: v3.0.0-alpha2, v2.6.4
    • Fix Version/s: Future, v2.6.5
    • Component/s: Query Engine
    • Labels:
      None

      Description

      Hi Kylin team,

      I found an issue while server side PreparedStatement enabled. The second time query and after's result will be different from the first when query TopN, and the result is not right.

      Part of Cube info:

       
      Dimensions
      TRANS_ID
      PART_DT
      SELLER_ID
      BUYER_ID

      Measures:
      SUM(PRICE)
      MAX(PRICE)
      TOPN(PRICE) Group By:KYLIN_SALES.SELLER_ID,KYLIN_SALES.BUYER_ID

       

      SQL:

      {"sql":"select seller_id, buyer_id, sum(PRICE) from glaucus.kylin_sales where PART_DT >= ? and PART_DT <= ? group by seller_id, buyer_id order by sum(PRICE) desc limit 20","project":"DDTFORTEST_Analytics", "params":[{"className": "java.lang.String","value": "2012-01-01"},{"className": "java.lang.String","value": "2012-01-10"}]}
      

      The First query result:

      The Second and after:

       -----------------------------------------------

      Root Cause

      Cached preparedContext is changed when doing preparedStatement.executeQuery, and losing groupByColumns. So the first execution result is correct, the second and the after will be incorrect.

      Real Root Cause

      The first time we entered PreparedStatement logic, we'll try to borrow preparedContext from cache pool, of course there isn't any, but the cache pool will execute create method to create a new preparedContext, and then loaned it to us.

      I didn't figure out how adjustSqlDigest works before, and try to remove code

      sqlDigest.groupbyColumns.removeAll(topnLiteralCol)

      but it's not right. Top-N isn't like some other measures, the dimensions aren't as part of row key, they stored in measures in design, so it's why the adjustSqlDigest matters, especially those codes.

      sqlDigest.groupbyColumns.removeAll(topnLiteralCol);
      sqlDigest.metricColumns.addAll(topnLiteralCol);
      

      The root cause for this issue is because the create sql digest is execute again after we store it in cache, so the digest changed.

      # This is from the first time
      fact table GLAUCUS.KYLIN_SALES,group by [],filter on [GLAUCUS.KYLIN_SALES.PART_DT],with aggregates[FunctionDesc [expression=TOP_N, parameter=GLAUCUS.KYLIN_SALES.PRICE,GLAUCUS.KYLIN_SALES.SELLER_ID,GLAUCUS.KYLIN_SALES.BUYER_ID, returnType=topn(5000,8)]].
      
      # This is the second one
      fact table GLAUCUS.KYLIN_SALES,group by [],filter on [GLAUCUS.KYLIN_SALES.PART_DT],with aggregates[FunctionDesc [expression=SUM, parameter=GLAUCUS.KYLIN_SALES.PRICE, returnType=decimal(19,4)]].
      
      

      So the second time and after we execute the same query or same pattern, the expression will be changed to SUM instead of TOPN, that's why the strange result show up.

        Attachments

        1. image-2019-11-18-15-55-00-312.png
          128 kB
          Marc Wu
        2. image-2019-11-18-15-55-11-906.png
          297 kB
          Marc Wu
        3. image-2019-11-18-19-29-34-489.png
          426 kB
          Marc Wu
        4. image-2019-11-18-19-29-42-721.png
          404 kB
          Marc Wu

          Activity

            People

            • Assignee:
              mwu_ Marc Wu
              Reporter:
              mwu_ Marc Wu
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated: