Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-49950

spark planing is too slow when LocalRelation has a lot of data

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Trivial
    • Resolution: Unresolved
    • 3.5.1
    • None
    • Spark Core
    • None

    Description

      ```scala

      import spark.implicits._

      val data = (0 until 10000000).toArray

       

       // spend more than 10s

      val ds = spark.createDataset(data) 

      // spend more than 5s

      ds.selectExpr("value+2 as value").selectExpr("value+3 as value").selectExpr("value+4 as value").selectExpr("value+5 as value") 

      ```

       

      This is caused by `LocalRelation`, because `mapExpressions` will go into the data and spend a lot of time to tranversal it. Any idea to fix this issue?

      Attachments

        Activity

          People

            Unassigned Unassigned
            xieshuaihu xie shuiahu
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: