Description
GenerateMutableProjection put all expressions columns into a single apply function. When there are a lot of columns, the apply function code size exceeds the 64kb limit, which is a hard limit on jvm that cannot change.
This comes up when we were aggregating about 100 columns using codegen and unsafe feature.
I wrote an unit test that reproduces this issue.
https://github.com/saurfang/spark/blob/codegen_size_limit/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CodeGenerationSuite.scala
This test currently fails at 2048 expressions. It seems the master is more tolerant than branch-1.4 about this because code is more concise.
While the code on master has changed since branch-1.4, I am able to reproduce the problem in master. For now I hacked my way in branch-1.4 to workaround this problem by wrapping each expression with a separate function then call those functions sequentially in apply. The proper way is probably check the length of the projectCode and break it up as necessary. (This seems to be easier in master actually since we are building code by string rather than quasiquote)
Let me know if anyone has additional thoughts on this, I'm happy to contribute a pull request.
Attaching stack trace produced by unit test
[info] - code size limit *** FAILED *** (7 seconds, 103 milliseconds) [info] com.google.common.util.concurrent.UncheckedExecutionException: org.codehaus.janino.JaninoRuntimeException: Code of method "(Ljava/lang/Object;)Ljava/lang/Object;" of class "SC$SpecificProjection" grows beyond 64 KB [info] at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2263) [info] at com.google.common.cache.LocalCache.get(LocalCache.java:4000) [info] at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4004) [info] at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874) [info] at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:285) [info] at org.apache.spark.sql.catalyst.expressions.CodeGenerationSuite$$anonfun$2$$anonfun$apply$mcV$sp$2.apply(CodeGenerationSuite.scala:50) [info] at org.apache.spark.sql.catalyst.expressions.CodeGenerationSuite$$anonfun$2$$anonfun$apply$mcV$sp$2.apply(CodeGenerationSuite.scala:48) [info] at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:144) [info] at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:144) [info] at scala.collection.immutable.Range.foreach(Range.scala:141) [info] at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:144) [info] at scala.collection.AbstractTraversable.foldLeft(Traversable.scala:105) [info] at org.apache.spark.sql.catalyst.expressions.CodeGenerationSuite$$anonfun$2.apply$mcV$sp(CodeGenerationSuite.scala:47) [info] at org.apache.spark.sql.catalyst.expressions.CodeGenerationSuite$$anonfun$2.apply(CodeGenerationSuite.scala:47) [info] at org.apache.spark.sql.catalyst.expressions.CodeGenerationSuite$$anonfun$2.apply(CodeGenerationSuite.scala:47) [info] at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) [info] at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) [info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) [info] at org.scalatest.Transformer.apply(Transformer.scala:22) [info] at org.scalatest.Transformer.apply(Transformer.scala:20) [info] at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166) [info] at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:42) [info] at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163) [info] at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) [info] at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) [info] at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) [info] at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175) [info] at org.scalatest.FunSuite.runTest(FunSuite.scala:1555) [info] at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) [info] at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) [info] at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413) [info] at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401) [info] at scala.collection.immutable.List.foreach(List.scala:318) [info] at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) [info] at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396) [info] at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483) [info] at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208) [info] at org.scalatest.FunSuite.runTests(FunSuite.scala:1555) [info] at org.scalatest.Suite$class.run(Suite.scala:1424) [info] at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555) [info] at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) [info] at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) [info] at org.scalatest.SuperEngine.runImpl(Engine.scala:545) [info] at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212) [info] at org.scalatest.FunSuite.run(FunSuite.scala:1555) [info] at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:462) [info] at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:671) [info] at sbt.ForkMain$Run$2.call(ForkMain.java:294) [info] at sbt.ForkMain$Run$2.call(ForkMain.java:284) [info] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [info] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [info] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [info] at java.lang.Thread.run(Thread.java:745) [info] Cause: org.codehaus.janino.JaninoRuntimeException: Code of method "(Ljava/lang/Object;)Ljava/lang/Object;" of class "SC$SpecificProjection" grows beyond 64 KB [info] at org.codehaus.janino.CodeContext.makeSpace(CodeContext.java:941) [info] at org.codehaus.janino.CodeContext.write(CodeContext.java:874) [info] at org.codehaus.janino.CodeContext.writeBranch(CodeContext.java:965) [info] at org.codehaus.janino.UnitCompiler.writeBranch(UnitCompiler.java:10261) [info] at org.codehaus.janino.UnitCompiler.compileBoolean2(UnitCompiler.java:2862) [info] at org.codehaus.janino.UnitCompiler.access$4800(UnitCompiler.java:185) [info] at org.codehaus.janino.UnitCompiler$8.visitAmbiguousName(UnitCompiler.java:2832) [info] at org.codehaus.janino.Java$AmbiguousName.accept(Java.java:3138) [info] at org.codehaus.janino.UnitCompiler.compileBoolean(UnitCompiler.java:2842) [info] at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:1741) [info] at org.codehaus.janino.UnitCompiler.access$1200(UnitCompiler.java:185) [info] at org.codehaus.janino.UnitCompiler$4.visitIfStatement(UnitCompiler.java:937) [info] at org.codehaus.janino.Java$IfStatement.accept(Java.java:2157) [info] at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:958) [info] at org.codehaus.janino.UnitCompiler.compileStatements(UnitCompiler.java:1007) [info] at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:2293) [info] at org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:822) [info] at org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:794) [info] at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:507) [info] at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:658) [info] at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:662) [info] at org.codehaus.janino.UnitCompiler.access$600(UnitCompiler.java:185) [info] at org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:350) [info] at org.codehaus.janino.Java$MemberClassDeclaration.accept(Java.java:1035) [info] at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:354) [info] at org.codehaus.janino.UnitCompiler.compileDeclaredMemberTypes(UnitCompiler.java:769) [info] at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:532) [info] at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:393) [info] at org.codehaus.janino.UnitCompiler.access$400(UnitCompiler.java:185) [info] at org.codehaus.janino.UnitCompiler$2.visitPackageMemberClassDeclaration(UnitCompiler.java:347) [info] at org.codehaus.janino.Java$PackageMemberClassDeclaration.accept(Java.java:1139) [info] at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:354) [info] at org.codehaus.janino.UnitCompiler.compileUnit(UnitCompiler.java:322) [info] at org.codehaus.janino.SimpleCompiler.compileToClassLoader(SimpleCompiler.java:383) [info] at org.codehaus.janino.ClassBodyEvaluator.compileToClass(ClassBodyEvaluator.java:315) [info] at org.codehaus.janino.ClassBodyEvaluator.cook(ClassBodyEvaluator.java:233) [info] at org.codehaus.janino.SimpleCompiler.cook(SimpleCompiler.java:192) [info] at org.codehaus.commons.compiler.Cookable.cook(Cookable.java:84) [info] at org.codehaus.commons.compiler.Cookable.cook(Cookable.java:77) [info] at org.codehaus.janino.ClassBodyEvaluator.<init>(ClassBodyEvaluator.java:72) [info] at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.compile(CodeGenerator.scala:245) [info] at org.apache.spark.sql.catalyst.expressions.codegen.GenerateMutableProjection$.create(GenerateMutableProjection.scala:87) [info] at org.apache.spark.sql.catalyst.expressions.codegen.GenerateMutableProjection$.create(GenerateMutableProjection.scala:29) [info] at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:272) [info] at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599) [info] at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) [info] at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342) [info] at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2257) [info] at com.google.common.cache.LocalCache.get(LocalCache.java:4000) [info] at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4004) [info] at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874) [info] at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:285) [info] at org.apache.spark.sql.catalyst.expressions.CodeGenerationSuite$$anonfun$2$$anonfun$apply$mcV$sp$2.apply(CodeGenerationSuite.scala:50) [info] at org.apache.spark.sql.catalyst.expressions.CodeGenerationSuite$$anonfun$2$$anonfun$apply$mcV$sp$2.apply(CodeGenerationSuite.scala:48) [info] at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:144) [info] at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:144) [info] at scala.collection.immutable.Range.foreach(Range.scala:141) [info] at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:144) [info] at scala.collection.AbstractTraversable.foldLeft(Traversable.scala:105) [info] at org.apache.spark.sql.catalyst.expressions.CodeGenerationSuite$$anonfun$2.apply$mcV$sp(CodeGenerationSuite.scala:47) [info] at org.apache.spark.sql.catalyst.expressions.CodeGenerationSuite$$anonfun$2.apply(CodeGenerationSuite.scala:47) [info] at org.apache.spark.sql.catalyst.expressions.CodeGenerationSuite$$anonfun$2.apply(CodeGenerationSuite.scala:47) [info] at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) [info] at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) [info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) [info] at org.scalatest.Transformer.apply(Transformer.scala:22) [info] at org.scalatest.Transformer.apply(Transformer.scala:20) [info] at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166) [info] at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:42) [info] at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163) [info] at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) [info] at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) [info] at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) [info] at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175) [info] at org.scalatest.FunSuite.runTest(FunSuite.scala:1555) [info] at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) [info] at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) [info] at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413) [info] at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401) [info] at scala.collection.immutable.List.foreach(List.scala:318) [info] at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) [info] at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396) [info] at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483) [info] at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208) [info] at org.scalatest.FunSuite.runTests(FunSuite.scala:1555) [info] at org.scalatest.Suite$class.run(Suite.scala:1424) [info] at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555) [info] at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) [info] at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) [info] at org.scalatest.SuperEngine.runImpl(Engine.scala:545) [info] at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212) [info] at org.scalatest.FunSuite.run(FunSuite.scala:1555) [info] at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:462) [info] at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:671) [info] at sbt.ForkMain$Run$2.call(ForkMain.java:294) [info] at sbt.ForkMain$Run$2.call(ForkMain.java:284) [info] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [info] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [info] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [info] at java.lang.Thread.run(Thread.java:745)
Attachments
Issue Links
- is duplicated by
-
SPARK-9058 if set `spark.sql.codegen` is true,More than 100 aggregation operation, it exceeds JVM code size limits
- Closed
- is related to
-
SPARK-14138 Generated SpecificColumnarIterator code can exceed JVM size limit for cached DataFrames
- Resolved
-
SPARK-14793 Code generation for large complex type exceeds JVM size limit.
- Resolved
- links to