Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-17131

Code generation fails when running SQL expressions against a wide dataset (thousands of columns)

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 2.0.0
    • None
    • SQL
    • None

    Description

      When reading the CSV file that contains 1776 columns Spark and Janino fail to generate the code with message:

      Constant pool has grown past JVM limit of 0xFFFF
      

      When running a common select with all columns it's fine:

            val allCols = df.columns.map(c => col(c).as(c + "_alias"))
            val newDf = df.select(allCols: _*)
            newDf.show()
      

      But when I invoke the describe method:

      newDf.describe(allCols: _*)
      

      it fails with the following stack trace:

      	at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:889)
      	at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:941)
      	at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:938)
      	at org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
      	at org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
      	... 30 more
      Caused by: org.codehaus.janino.JaninoRuntimeException: Constant pool has grown past JVM limit of 0xFFFF
      	at org.codehaus.janino.util.ClassFile.addToConstantPool(ClassFile.java:402)
      	at org.codehaus.janino.util.ClassFile.addConstantIntegerInfo(ClassFile.java:300)
      	at org.codehaus.janino.UnitCompiler.addConstantIntegerInfo(UnitCompiler.java:10307)
      	at org.codehaus.janino.UnitCompiler.pushConstant(UnitCompiler.java:8868)
      	at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:4346)
      	at org.codehaus.janino.UnitCompiler.access$7100(UnitCompiler.java:185)
      	at org.codehaus.janino.UnitCompiler$10.visitIntegerLiteral(UnitCompiler.java:3265)
      	at org.codehaus.janino.Java$IntegerLiteral.accept(Java.java:4321)
      	at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3290)
      	at org.codehaus.janino.UnitCompiler.fakeCompile(UnitCompiler.java:2605)
      	at org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4362)
      	at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:3975)
      	at org.codehaus.janino.UnitCompiler.access$6900(UnitCompiler.java:185)
      	at org.codehaus.janino.UnitCompiler$10.visitMethodInvocation(UnitCompiler.java:3263)
      	at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:3974)
      	at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3290)
      	at org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4368)
      	at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:2662)
      	at org.codehaus.janino.UnitCompiler.access$4400(UnitCompiler.java:185)
      	at org.codehaus.janino.UnitCompiler$7.visitMethodInvocation(UnitCompiler.java:2627)
      	at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:3974)
      	at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:2654)
      	at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:1643)
      ....
      

      Attachments

        Issue Links

          Activity

            aeskilson Aleksander Eskilson added a comment - - edited

            Yeah, that makes sense. So far, what I documented and this one seem to have been the only JIRAs that exhibit specifically the Constant Pool limit error. I'm trying to dig deeper into it to see if it really marks its own class of error, but given that SPARK-17702 didn't resolve the error case I posted (even though it splits up sections of large generated code), I do suspect they are, quite related, but ultimately different issues. I think the splitExpressions technique that was used in SPARK-17702 and that also appears to be being employed in SPARK-16845 could be useful for the range of different classes that can generate too many lines of code. Seeing the issues linked together is definitely useful.

            To that end, I'll leave mine resolved as a duplicate of SPARK-16845 for now until I can make use of the patch it develops, so we can see more conclusively if they're related issues, or truly duplicates. And I'll link the two "0xFFFF" issues together as related.

            aeskilson Aleksander Eskilson added a comment - - edited Yeah, that makes sense. So far, what I documented and this one seem to have been the only JIRAs that exhibit specifically the Constant Pool limit error. I'm trying to dig deeper into it to see if it really marks its own class of error, but given that SPARK-17702 didn't resolve the error case I posted (even though it splits up sections of large generated code), I do suspect they are, quite related, but ultimately different issues. I think the splitExpressions technique that was used in SPARK-17702 and that also appears to be being employed in SPARK-16845 could be useful for the range of different classes that can generate too many lines of code. Seeing the issues linked together is definitely useful. To that end, I'll leave mine resolved as a duplicate of SPARK-16845 for now until I can make use of the patch it develops, so we can see more conclusively if they're related issues, or truly duplicates. And I'll link the two "0xFFFF" issues together as related.
            srowen Sean R. Owen added a comment -

            OK well I think it's fine to leave one copy of the "0xFFFF" issue open if you have any reasonable reason to suspect it's different, and just link the JIRAs. I suppose I was mostly saying this could just be reopened, and separately, there are a lot of real duplicates of similar issues out there too, making it hard to figure out what the underlying unique issues are.

            srowen Sean R. Owen added a comment - OK well I think it's fine to leave one copy of the "0xFFFF" issue open if you have any reasonable reason to suspect it's different, and just link the JIRAs. I suppose I was mostly saying this could just be reopened, and separately, there are a lot of real duplicates of similar issues out there too, making it hard to figure out what the underlying unique issues are.

            Sure, I apologize for that. I'll also mark it as a duplicate of SPARK-16845 and monitor its pull-request to see if it resolves the issue I opened.

            aeskilson Aleksander Eskilson added a comment - Sure, I apologize for that. I'll also mark it as a duplicate of SPARK-16845 and monitor its pull-request to see if it resolves the issue I opened.
            srowen Sean R. Owen added a comment -

            It may or may not be, though again I suspect a common cause with one of several JIRAs. The point here is to join potentially related discussion without conflating issues. I don't think it's useful to just make another JIRA vs reopening this one, but, this seems to be a losing battle.

            srowen Sean R. Owen added a comment - It may or may not be, though again I suspect a common cause with one of several JIRAs. The point here is to join potentially related discussion without conflating issues. I don't think it's useful to just make another JIRA vs reopening this one, but, this seems to be a losing battle.

            sowen, melentye
            I'm not so certain this error is the same as SPARK-16845. It seems like there have been several classes of errors all related to the sizes of individual methods growing beyond the 64 KB limit (SPARK-16845, SPARK-17702). I think this one is of a different class of error, or at least

            Constant pool has grown past JVM limit of 0xFFFF

            marks a different class of error. I was able to produce similar to the one first documented when trying to encode a Java object with a very wide and deeply nested schema. I've gone ahead and created a bug report for that, SPARK-18016, and in its description I've attached a small project that can reproduce the error.

            aeskilson Aleksander Eskilson added a comment - sowen , melentye I'm not so certain this error is the same as SPARK-16845 . It seems like there have been several classes of errors all related to the sizes of individual methods growing beyond the 64 KB limit ( SPARK-16845 , SPARK-17702 ). I think this one is of a different class of error, or at least Constant pool has grown past JVM limit of 0xFFFF marks a different class of error. I was able to produce similar to the one first documented when trying to encode a Java object with a very wide and deeply nested schema. I've gone ahead and created a bug report for that, SPARK-18016 , and in its description I've attached a small project that can reproduce the error.
            melentye Andrey Melentyev added a comment - Looks similar to https://issues.apache.org/jira/browse/SPARK-17217 btw

            I tried wrapping the attached test into "withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "false")" - still fails in a nasty way, printing the content of the 300K LOC generated class in seemingly endless loop. Running the code from spark-shell with --conf spark.sql.codegen.wholeStage=false, fails as well.

            melentye Andrey Melentyev added a comment - I tried wrapping the attached test into "withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "false")" - still fails in a nasty way, printing the content of the 300K LOC generated class in seemingly endless loop. Running the code from spark-shell with --conf spark.sql.codegen.wholeStage=false, fails as well.
            srowen Sean R. Owen added a comment -

            Yeah I'm not 100% sure, though I strongly suspect a common cause. If it ends up being different we can reopen this. I though ti might be more productive to tie them together until it's clear they're not the same, but I don't mind much either way, whatever is most helpful.

            Can you try disabling whole stage codegen to see if that works around it?

            srowen Sean R. Owen added a comment - Yeah I'm not 100% sure, though I strongly suspect a common cause. If it ends up being different we can reopen this. I though ti might be more productive to tie them together until it's clear they're not the same, but I don't mind much either way, whatever is most helpful. Can you try disabling whole stage codegen to see if that works around it?
            melentye Andrey Melentyev added a comment - - edited

            srowen are you sure it's a dup of SPARK-16845? The exceptions are a bit different, this one has

            Caused by: org.codehaus.janino.JaninoRuntimeException: Constant pool for class org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificMutableProjection has grown past JVM limit of 0xFFFF
            

            while SPARK-16845 says

            Caused by: org.codehaus.janino.JaninoRuntimeException: Code of method "(Lorg/apache/spark/sql/catalyst/InternalRow;Lorg/apache/spark/sql/catalyst/InternalRow;)I" of class "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" grows beyond 64 KB
            

            both are about something growing too large in a generated class source code though.

            melentye Andrey Melentyev added a comment - - edited srowen are you sure it's a dup of SPARK-16845 ? The exceptions are a bit different, this one has Caused by: org.codehaus.janino.JaninoRuntimeException: Constant pool for class org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificMutableProjection has grown past JVM limit of 0xFFFF while SPARK-16845 says Caused by: org.codehaus.janino.JaninoRuntimeException: Code of method "(Lorg/apache/spark/sql/catalyst/InternalRow;Lorg/apache/spark/sql/catalyst/InternalRow;)I" of class "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" grows beyond 64 KB both are about something growing too large in a generated class source code though.
            srowen Sean R. Owen added a comment -

            Thanks melentye , let's roll this into the existing JIRA.

            srowen Sean R. Owen added a comment - Thanks melentye , let's roll this into the existing JIRA.

            Patch for org.apache.spark.sql.DataFrameSuite with a test case reproducing the problem

            melentye Andrey Melentyev added a comment - Patch for org.apache.spark.sql.DataFrameSuite with a test case reproducing the problem

            Hi there,

            I discovered a bug, and it also pertains to code generation with many columns – although in my case the bugs within Janino code generation in Catalyst start after several hundred columns. Are these somehow related?

            My bug report was merged into this one: https://issues.apache.org/jira/browse/SPARK-16845

            arisofalaska@gmail.com Aris Vlasakakis added a comment - Hi there, I discovered a bug, and it also pertains to code generation with many columns – although in my case the bugs within Janino code generation in Catalyst start after several hundred columns. Are these somehow related? My bug report was merged into this one: https://issues.apache.org/jira/browse/SPARK-16845

            Having a different exception when trying to apply mean function to all columns:

            val allCols = df.columns.map(c => mean(c))
            val newDf = df.select(allCols: _*)
            newDf.show()
            
            java.io.EOFException
            	at java.io.DataInputStream.readFully(DataInputStream.java:197)
            	at java.io.DataInputStream.readFully(DataInputStream.java:169)
            	at org.codehaus.janino.util.ClassFile.loadAttribute(ClassFile.java:1383)
            	at org.codehaus.janino.util.ClassFile.loadAttributes(ClassFile.java:555)
            	at org.codehaus.janino.util.ClassFile.loadFields(ClassFile.java:518)
            	at org.codehaus.janino.util.ClassFile.<init>(ClassFile.java:185)
            	at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anonfun$recordCompilationStats$1.apply(CodeGenerator.scala:914)
            	at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anonfun$recordCompilationStats$1.apply(CodeGenerator.scala:912)
            	at scala.collection.Iterator$class.foreach(Iterator.scala:742)
            	at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
            	at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
            	at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
            	at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.recordCompilationStats(CodeGenerator.scala:912)
            	at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:884)
            	at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:941)
            	at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:938)
            	at org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
            ...
            
            zyoma Iaroslav Zeigerman added a comment - Having a different exception when trying to apply mean function to all columns: val allCols = df.columns.map(c => mean(c)) val newDf = df.select(allCols: _*) newDf.show() java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:197) at java.io.DataInputStream.readFully(DataInputStream.java:169) at org.codehaus.janino.util.ClassFile.loadAttribute(ClassFile.java:1383) at org.codehaus.janino.util.ClassFile.loadAttributes(ClassFile.java:555) at org.codehaus.janino.util.ClassFile.loadFields(ClassFile.java:518) at org.codehaus.janino.util.ClassFile.<init>(ClassFile.java:185) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anonfun$recordCompilationStats$1.apply(CodeGenerator.scala:914) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anonfun$recordCompilationStats$1.apply(CodeGenerator.scala:912) at scala.collection.Iterator$class.foreach(Iterator.scala:742) at scala.collection.AbstractIterator.foreach(Iterator.scala:1194) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.recordCompilationStats(CodeGenerator.scala:912) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:884) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:941) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:938) at org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599) ...

            People

              Unassigned Unassigned
              zyoma Iaroslav Zeigerman
              Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: