Uploaded image for project: 'Kylin'
  1. Kylin
  2. KYLIN-4342

Build Global Dict by MR/Hive New Version

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • v3.1.0
    • None
    • None
    • Sprint 50

    Description

      At present, there are two limitations and some distributed concurrency lock bugs in the implementation of global dictionary through MR/Hive:
      1. Limited by Hive order by global sorting on the shuffle stage, the memory and build time becomes uncontrollable with data volume reaching billion level. We have tested the base of 800 million level to configure 15g memory, and the build time of build dictionary needs more than 10 hours;
      2. Multi global dictionary columns is calculated serially.
      3. Some distributed concurrency lock bugs.

      We have improved the original version.The general idea of the new version is the same as the previous Mr / Hive implementation, that is, to complete global dictionary coding through Hive or MR, and then replace the original value in the flat table with the dictionary encoded value.[Mr /Hive V1|http://kylin.apache.org/docs30/howto/howto_use_hive_mr_dict.html]
      However, in the new version, will add "parallel part build" and "parallel total build" two steps by mr to replace the original "build dict" step, so as to solve the above two limitations.And use ZK to solve the distributed concurrency lock bugs. 

      Attachments

        There are no Sub-Tasks for this issue.

        Activity

          People

            wangxiaojing wangxiaojing
            wangxiaojing wangxiaojing
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: