[SPARK-25987] StackOverflowError when executing many operations on a table with many columns - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: 2.2.1, 2.2.2, 2.3.0, 2.3.2, 2.4.0, 2.4.5
Fix Version/s: None
Component/s: SQL
Labels:
None
Environment:

Ubuntu 18.04.1 LTS, openjdk "1.8.0_181"

Description

When I execute

import org.apache.spark.sql._
import org.apache.spark.sql.types._

val columnsCount = 100
val columns = (1 to columnsCount).map(i => s"col$i")
val initialData = (1 to columnsCount).map(i => s"val$i")

val df = spark.createDataFrame(
  rowRDD = spark.sparkContext.makeRDD(Seq(Row.fromSeq(initialData))),
  schema = StructType(columns.map(StructField(_, StringType, true)))
)

val addSuffixUDF = udf(
  (str: String) => str + "_added"
)

implicit class DFOps(df: DataFrame) {
  def addSuffix() = {
    df.select(columns.map(col =>
      addSuffixUDF(df(col)).as(col)
    ): _*)
  }
}

df.addSuffix().addSuffix().addSuffix().show()

I get

An exception or error caused a run to abort.
java.lang.StackOverflowError
 at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:385)
 at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:553)
...

If I reduce columns number (to 10 for example) or do `addSuffix` only once - it works fine.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Ivan Tsukanov

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 09/Nov/18 05:30

Updated:: 13/Mar/20 04:59

Resolved:: 11/Mar/20 18:28