Uploaded image for project: 'Kylin'
  1. Kylin
  2. KYLIN-3729

CLUSTER BY CAST(field AS STRING) will accelerate base cuboid build with UHC global dict

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • v2.5.2
    • v2.6.0
    • Job Engine
    • None

    Description

      As we know global dict is a sliced  appendTrieTree using cache-loader , so if we convert values to ids using global dict, ordered values will help.

      And now we can set kylin.source.hive.flat-table-cluster-by-dict-column = uhc column, to make source data CLUSTER BY uhc-column, this get better.

      But the appendTrieTree is order by string, so we can  CLUSTER BY CAST(uhc-column AS STRING), to optimize most.

      We can see the hdfs bytes read (most is global dict) reduce to 30%

      Attachments

        1. image-2018-12-19-12-01-20-430.png
          42 kB
          Fangyuan Deng
        2. image-2018-12-19-12-02-08-913.png
          48 kB
          Fangyuan Deng
        3. KYLIN-3729.1.patch
          1 kB
          Fangyuan Deng

        Activity

          People

            whisper_deng Fangyuan Deng
            whisper_deng Fangyuan Deng
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: