Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
2.0.0
-
None
-
None
Description
When reading the CSV file that contains 1776 columns Spark and Janino fail to generate the code with message:
Constant pool has grown past JVM limit of 0xFFFF
When running a common select with all columns it's fine:
val allCols = df.columns.map(c => col(c).as(c + "_alias"))
val newDf = df.select(allCols: _*)
newDf.show()
But when I invoke the describe method:
newDf.describe(allCols: _*)
it fails with the following stack trace:
at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:889) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:941) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:938) at org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599) at org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) ... 30 more Caused by: org.codehaus.janino.JaninoRuntimeException: Constant pool has grown past JVM limit of 0xFFFF at org.codehaus.janino.util.ClassFile.addToConstantPool(ClassFile.java:402) at org.codehaus.janino.util.ClassFile.addConstantIntegerInfo(ClassFile.java:300) at org.codehaus.janino.UnitCompiler.addConstantIntegerInfo(UnitCompiler.java:10307) at org.codehaus.janino.UnitCompiler.pushConstant(UnitCompiler.java:8868) at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:4346) at org.codehaus.janino.UnitCompiler.access$7100(UnitCompiler.java:185) at org.codehaus.janino.UnitCompiler$10.visitIntegerLiteral(UnitCompiler.java:3265) at org.codehaus.janino.Java$IntegerLiteral.accept(Java.java:4321) at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3290) at org.codehaus.janino.UnitCompiler.fakeCompile(UnitCompiler.java:2605) at org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4362) at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:3975) at org.codehaus.janino.UnitCompiler.access$6900(UnitCompiler.java:185) at org.codehaus.janino.UnitCompiler$10.visitMethodInvocation(UnitCompiler.java:3263) at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:3974) at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3290) at org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4368) at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:2662) at org.codehaus.janino.UnitCompiler.access$4400(UnitCompiler.java:185) at org.codehaus.janino.UnitCompiler$7.visitMethodInvocation(UnitCompiler.java:2627) at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:3974) at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:2654) at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:1643) ....
Attachments
Attachments
Issue Links
- duplicates
-
SPARK-16845 org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" grows beyond 64 KB
- Resolved
- is related to
-
SPARK-18016 Code Generation: Constant Pool Past Limit for Wide/Nested Dataset
- Resolved
Yeah, that makes sense. So far, what I documented and this one seem to have been the only JIRAs that exhibit specifically the Constant Pool limit error. I'm trying to dig deeper into it to see if it really marks its own class of error, but given that
SPARK-17702didn't resolve the error case I posted (even though it splits up sections of large generated code), I do suspect they are, quite related, but ultimately different issues. I think the splitExpressions technique that was used inSPARK-17702and that also appears to be being employed inSPARK-16845could be useful for the range of different classes that can generate too many lines of code. Seeing the issues linked together is definitely useful.To that end, I'll leave mine resolved as a duplicate of
SPARK-16845for now until I can make use of the patch it develops, so we can see more conclusively if they're related issues, or truly duplicates. And I'll link the two "0xFFFF" issues together as related.