Uploaded image for project: 'Kylin'
  1. Kylin
  2. KYLIN-2243

TopN memory estimation is inaccurate in some cases

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • v2.0.0
    • None
    • None

    Description

      TopNCounterSerializer.maxLength() and TopNCounterSerializer.getStorageBytesEstimate() might be inaccurate, especially when there are multiple "group by" columns in one TopN measure and some uses long bytes encoding like "fixed_length:16"

      The inaccurate estimation may cause memory issue when using in-mem cubing, and will cause the estimation on final cube size inaccurate.

      The root cause is the data type like "top(100)" doesn't have the info of how long a key can be. So far it uses a default value 4 which is too small when the encoding is something like "fixed_length:16". The solution is extending the expression of data type to "top(100, 16)" to indicate that one key can be 16 bytes long. If the "scale" is absent, use 4 bytes as default.

      Attachments

        Activity

          People

            shaofengshi Shao Feng Shi
            shaofengshi Shao Feng Shi
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: