Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-23598

WholeStageCodegen can lead to IllegalAccessError calling append for HashAggregateExec

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.4.0
    • Fix Version/s: 2.3.1, 2.4.0
    • Component/s: Spark Core
    • Labels:
      None

      Description

      Got the following stacktrace for a large QueryPlan using WholeStageCodeGen:

      java.lang.IllegalAccessError: tried to access method org.apache.spark.sql.execution.BufferedRowIterator.append(Lorg/apache/spark/sql/catalyst/InternalRow;)V from class org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7$agg_NestedClass
      at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7$agg_NestedClass.agg_doAggregateWithKeysOutput$(Unknown Source)
      at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7.processNext(Unknown Source)
      at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
      at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
      at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
      at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
      at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
      at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
      at org.apache.spark.scheduler.Task.run(Task.scala:109)
      at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)

      After disabling codegen, everything works.

      The root cause seems to be that we are trying to call the protected append method of BufferedRowIterator from an inner-class of a sub-class that is loaded by a different class-loader (after codegen compilation).

      https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-5.html#jvms-5.4.4 states that a protected method R can be accessed only if one of the following two conditions is fulfilled:

      1. R is protected and is declared in a class C, and D is either a subclass of C or C itself. Furthermore, if R is not static, then the symbolic reference to R must contain a symbolic reference to a class T, such that T is either a subclass of D, a superclass of D, or D itself.
      2. R is either protected or has default access (that is, neither public nor protected nor private), and is declared by a class in the same run-time package as D.

      2.) doesn't apply as we have loaded the class with a different class loader (and are in a different package) and 1.) doesn't apply because we are apparently trying to call the method from an inner class of a subclass of BufferedRowIterator.

      Looking at the Code path of WholeStageCodeGen, the following happens:

      1. In WholeStageCodeGen, we create the subclass of BufferedRowIterator, along with a processNext method for processing the output of the child plan.
      2. In the child, which is a HashAggregateExec, we create the method which shows up at the top of the stack trace (called doAggregateWithKeysOutput )
      3. We add this method to the compiled code invoking addNewFunction of CodeGenerator
        In the generated function body we call the append method.|

      Now, the addNewFunction method states that:

      If the code for the `OuterClass` grows too large, the function will be inlined into a new private, inner class
      

      This indeed seems to happen: the doAggregateWithKeysOutput method is put into a new private inner class. Thus, it doesn't have access to the protected append method anymore but still tries to call it, which results in the IllegalAccessError. 

      Possible fixes:

      • Pass in the inlineToOuterClass flag when invoking the addNewFunction
      • Make the append method public
      • Re-declare the append method in the generated subclass (just invoking super). This way, inner classes should have access to it.

       

        Attachments

          Activity

            People

            • Assignee:
              kiszk Kazuaki Ishizaki
              Reporter:
              dvogelbacher David Vogelbacher
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: