Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-19372

Code generation for Filter predicate including many OR conditions exceeds JVM method size limit

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.1.0
    • Fix Version/s: 2.2.0, 2.3.0
    • Component/s: None
    • Labels:
      None

      Description

      For the attached csv file, the code below causes the exception "org.codehaus.janino.JaninoRuntimeException: Code of method "(Lorg/apache/spark/sql/catalyst/InternalRow;)Z" of class "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificPredicate" grows beyond 64 KB

      Code:

        val conf = new SparkConf().setMaster("local[1]")
        val sqlContext = SparkSession.builder().config(conf).getOrCreate().sqlContext
      
        val dataframe =
          sqlContext
            .read
            .format("com.databricks.spark.csv")
            .load("wide400cols.csv")
      
        val filter = (0 to 399)
          .foldLeft(lit(false))((e, index) => e.or(dataframe.col(dataframe.columns(index)) =!= s"column${index+1}"))
      
        val filtered = dataframe.filter(filter)
        filtered.show(100)
      

        Attachments

        1. wide400cols.csv
          7 kB
          Jay Pranavamurthi

          Activity

            People

            • Assignee:
              kiszk Kazuaki Ishizaki
              Reporter:
              jay.pranavamurthi Jay Pranavamurthi
            • Votes:
              1 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: