Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-25987

StackOverflowError when executing many operations on a table with many columns

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 2.2.1, 2.2.2, 2.3.0, 2.3.2, 2.4.0, 2.4.5
    • None
    • SQL
    • None
    • Ubuntu 18.04.1 LTS, openjdk "1.8.0_181"

    Description

      When I execute

      import org.apache.spark.sql._
      import org.apache.spark.sql.types._
      
      val columnsCount = 100
      val columns = (1 to columnsCount).map(i => s"col$i")
      val initialData = (1 to columnsCount).map(i => s"val$i")
      
      val df = spark.createDataFrame(
        rowRDD = spark.sparkContext.makeRDD(Seq(Row.fromSeq(initialData))),
        schema = StructType(columns.map(StructField(_, StringType, true)))
      )
      
      val addSuffixUDF = udf(
        (str: String) => str + "_added"
      )
      
      implicit class DFOps(df: DataFrame) {
        def addSuffix() = {
          df.select(columns.map(col =>
            addSuffixUDF(df(col)).as(col)
          ): _*)
        }
      }
      
      df.addSuffix().addSuffix().addSuffix().show()
      

      I get

      An exception or error caused a run to abort.
      java.lang.StackOverflowError
       at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:385)
       at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:553)
      ...
      

      If I reduce columns number (to 10 for example) or do `addSuffix` only once - it works fine.

      Attachments

        Activity

          People

            Unassigned Unassigned
            itsukanov Ivan Tsukanov
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: