Uploaded image for project: 'Kylin'
  1. Kylin
  2. KYLIN-744 Spark Cube Build Engine
  3. KYLIN-1094

improve performance of spark cubing

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Won't Fix
    • v1.4.0
    • Backlog
    • Spark Engine
    • None

    Description

      POC result of spark cubing shows that, on a dataset of 150 million records, MR is about 100% faster than Spark, however we believe that Spark could be at least at same speed as MR, so optimization is needed here.
      We are asking Spark community for help now.

      the cluster info:
      vm: 8 nodes * (128G mem + 64 core)
      hadoop cluster: hdp 2.2.6
      spark running mode: yarn-client
      spark version: 1.5.1

      Attachments

        Activity

          People

            lidong_sjtu Dong Li
            qhzhou Qianhao Zhou
            Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: