Uploaded image for project: 'Kylin'
  1. Kylin
  2. KYLIN-5571

It takes too much time to calculate the data size during pushing down queries, which will lead to the queries un-stoppable.

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 5.0-alpha
    • 5.0-beta
    • Query Engine
    • None

    Description

      During pushing down the query, KE will try to calculate the included data size to set Spark partitions, but if there were too many files on HDFS, it will take a lot of time to complete.

      So in order to improve this situation, the following things will be done:

      1. Using a limited thread pool to calculate the data size
      2. Add timeout for the calculation, so as to stop the query as soon as possible
      3. Add new properties:
        kylin.query.pushdown.auto-set-shuffle-partitions-multiple=3,the default Spark partition num
        kylin.query.pushdown.auto-set-shuffle-partitions-timeout=30, the maximum timeout, 30 seconds by default, to calculate the data size in order to adjust the Spark partition num

      After these changes, we can expected the query complete in a fixed duration.

      Attachments

        Activity

          People

            newboy Guangyuan Feng
            newboy Guangyuan Feng
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: