Uploaded image for project: 'Kylin'
  1. Kylin
  2. KYLIN-3729

CLUSTER BY CAST(field AS STRING) will accelerate base cuboid build with UHC global dict

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: v2.5.2
    • Fix Version/s: v2.6.0
    • Component/s: Job Engine
    • Labels:
      None

      Description

      As we know global dict is a sliced  appendTrieTree using cache-loader , so if we convert values to ids using global dict, ordered values will help.

      And now we can set kylin.source.hive.flat-table-cluster-by-dict-column = uhc column, to make source data CLUSTER BY uhc-column, this get better.

      But the appendTrieTree is order by string, so we can  CLUSTER BY CAST(uhc-column AS STRING), to optimize most.

      We can see the hdfs bytes read (most is global dict) reduce to 30%

        Attachments

        1. image-2018-12-19-12-01-20-430.png
          42 kB
          Fangyuan Deng
        2. image-2018-12-19-12-02-08-913.png
          48 kB
          Fangyuan Deng
        3. KYLIN-3729.1.patch
          1 kB
          Fangyuan Deng

          Activity

            People

            • Assignee:
              whisper_deng Fangyuan Deng
              Reporter:
              whisper_deng Fangyuan Deng
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: