Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-19372

Code generation for Filter predicate including many OR conditions exceeds JVM method size limit

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.1.0
    • 2.2.0, 2.3.0
    • None
    • None

    Description

      For the attached csv file, the code below causes the exception "org.codehaus.janino.JaninoRuntimeException: Code of method "(Lorg/apache/spark/sql/catalyst/InternalRow;)Z" of class "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificPredicate" grows beyond 64 KB

      Code:

        val conf = new SparkConf().setMaster("local[1]")
        val sqlContext = SparkSession.builder().config(conf).getOrCreate().sqlContext
      
        val dataframe =
          sqlContext
            .read
            .format("com.databricks.spark.csv")
            .load("wide400cols.csv")
      
        val filter = (0 to 399)
          .foldLeft(lit(false))((e, index) => e.or(dataframe.col(dataframe.columns(index)) =!= s"column${index+1}"))
      
        val filtered = dataframe.filter(filter)
        filtered.show(100)
      

      Attachments

        1. wide400cols.csv
          7 kB
          Jay Pranavamurthi

        Activity

          People

            kiszk Kazuaki Ishizaki
            jay.pranavamurthi Jay Pranavamurthi
            Votes:
            1 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: