Uploaded image for project: 'Kylin'
  1. Kylin
  2. KYLIN-4035

Calculate column cardinality by using spark engine

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • None
    • v3.0.0-alpha2
    • Spark Engine
    • None
    • kylin: master/3.0.0-alpha
      spark: 2.4.3
      hadoop: 2.6.5

    Description

      Kylin will calculate column cardinality when loading hive table. This stage is only supported by MR engine without spark. I think spark engine should be used in this stage because of the following:

      1) Kylin users can choose which engine they apply when calculating column cardinality;

      2) Some good spark features(e.g. dynamic resource allocation) can be used; 

      3) The code written in spark is simple.

      I finish this work and test ok. But "kylin.engine.spark-cardinality=true" should be added in kylin.properties(default is false). Look forwards to suggestions.

      Best regards. 

      Attachments

        Issue Links

          Activity

            People

              majic31@163.com Jack
              majic31@163.com Jack
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: