Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-18091

Deep if expressions cause Generated SpecificUnsafeProjection code to exceed JVM code size limit

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 1.6.1
    • 2.0.3, 2.1.0
    • SQL
    • None

    Description

      Problem Description:
      I have an application in which a lot of if-else decisioning is involved to generate output. I'm getting following exception:
      Caused by: org.codehaus.janino.JaninoRuntimeException: Code of method "(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$SpecificUnsafeProjection;Lorg/apache/spark/sql/catalyst/InternalRow;)V" of class "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection" grows beyond 64 KB
      at org.codehaus.janino.CodeContext.makeSpace(CodeContext.java:941)
      at org.codehaus.janino.CodeContext.write(CodeContext.java:874)
      at org.codehaus.janino.CodeContext.writeBranch(CodeContext.java:965)
      at org.codehaus.janino.UnitCompiler.writeBranch(UnitCompiler.java:10261)

      Steps to Reproduce:
      I've come up with a unit test which I was able to run in CodeGenerationSuite.scala:

      test("split large if expressions into blocks due to JVM code size limit") {
          val row = create_row("afafFAFFsqcategory2dadDADcategory8sasasadscategory24", 0)
          val inputStr = 'a.string.at(0)
          val inputIdx = 'a.int.at(1)
      
          val length = 10
          val valuesToCompareTo = for (i <- 1 to (length + 1)) yield ("category" + i)
      
          val initCondition = EqualTo(RegExpExtract(inputStr, Literal("category1"), inputIdx), valuesToCompareTo(0))
          var res: Expression = If(initCondition, Literal("category1"), Literal("NULL"))
          var cummulativeCondition: Expression = Not(initCondition)
          for (index <- 1 to length) {
            val valueExtractedFromInput = RegExpExtract(inputStr, Literal("category" + (index + 1).toString), inputIdx)
            val currComparee = If(cummulativeCondition, valueExtractedFromInput, Literal("NULL"))
            val currCondition = EqualTo(currComparee, valuesToCompareTo(index))
            val combinedCond = And(cummulativeCondition, currCondition)
            res = If(combinedCond, If(combinedCond, valueExtractedFromInput, Literal("NULL")), res)
            cummulativeCondition = And(Not(currCondition), cummulativeCondition)
          }
      
          val expressions = Seq(res)
          val plan = GenerateUnsafeProjection.generate(expressions, true)
          val actual = plan(row).toSeq(expressions.map(_.dataType))
          val expected = Seq(UTF8String.fromString("category2"))
      
          if (!checkResult(actual, expected)) {
            fail(s"Incorrect Evaluation: expressions: $expressions, actual: $actual, expected: $expected")
          }
        }
      

      Root Cause:
      Current splitting of Projection codes doesn't (and can't) take care of splitting the generated code for individual output column expressions. So it can grow to exceed JVM limit.

      Note: This issue seems related to SPARK-14887 but I'm not sure whether the root cause is same

      Proposed Fix:
      If expression should place it's predicate, true value and false value expressions' generated code in separate methods in context and call these methods instead of putting the whole code directly in its generated code

      Attachments

        Activity

          People

            kapilsingh5050 Kapil Singh
            kapilsingh5050 Kapil Singh
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: