Uploaded image for project: 'Kylin'
  1. Kylin
  2. KYLIN-4888

Performance optimization of union query with spark engine

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • v4.0.0-alpha
    • v4.0.0
    • Spark Engine
    • None

    Description

      when using union query with spark engine, UnionPlan transforms OLAPUnionRel to spark

      DataFrame, when OLAPUnionRel.all = false, distinct transformation of spark will be used, but

      it's used in a loop which traversing the DataFrame collection so that we don't have an excepted optimized flattenUnion plan(the CombineUnions rule of spark optimize the distinct, but the nested union plan does not be flattened),there are so many stages in spark dag.  Actuall, distinct transformation should be used only once at last.

      Attachments

        1. spark_union_plan_comparison
          18 kB
          Feng Zhu
        2. stages_after.png
          384 kB
          Feng Zhu
        3. stages before.png
          478 kB
          Feng Zhu

        Activity

          People

            fishcus Feng Zhu
            fishcus Feng Zhu
            Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: