Uploaded image for project: 'Kylin'
  1. Kylin
  2. KYLIN-3491

Improve the cube building process when using global dictionary

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: v2.5.0
    • Component/s: Job Engine
    • Labels:
      None

      Description

      By current cubing process, if the global dictionary is very large, since the raw data records are unsorted, it's hard to encode raw values into ids for the input of bitmap due to frequent swap of the dictionary slices. We need a refined process. The idea is as follows:

      1. for each source data block, there will be a mapper generating the distinct values & sort them
      2. encode the sorted distinct values and generate a shrunken dict for each source data block.
      3. when building base cuboid, use the shrunken dict for each source data block for encoding.

        Attachments

        1. APACHE-KYLIN-3491.patch
          53 kB
          Zhong Yanghong
        2. APACHE-KYLIN-3491-with-fix.patch
          53 kB
          Zhong Yanghong

          Issue Links

            Activity

              People

              • Assignee:
                yaho Zhong Yanghong
                Reporter:
                yaho Zhong Yanghong
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: