Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-35329

Split generated switch code into pieces in ExpandExec

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 3.2.0
    • 3.2.0
    • SQL
    • None

    Description

      This ticket aim at splitting generated switch code into smaller ones in `ExpandExec`. In the current master, even a simple query like the one below generates a large method whose size (`maxMethodCodeSize:7448`) is close to `8000` (`CodeGenerator.DEFAULT_JVM_HUGE_METHOD_LIMIT`);

      scala> val df = Seq(("2016-03-27 19:39:34", 1, "a"), ("2016-03-27 19:39:56", 2, "a"), ("2016-03-27 19:39:27", 4, "b")).toDF("time", "value", "id")
      scala> val rdf = df.select(window($"time", "10 seconds", "3 seconds", "0 second"), $"value").orderBy($"window.start".asc, $"value".desc).select("value")
      scala> sql("SET spark.sql.adaptive.enabled=false")
      scala> import org.apache.spark.sql.execution.debug._ 
      scala> rdf.debugCodegen
      
      Found 2 WholeStageCodegen subtrees.
      == Subtree 1 / 2 (maxMethodCodeSize:7448; maxConstantPoolSize:189(0.29% used); numInnerClasses:0) ==
                                          ^^^^
      *(1) Project [window#34.start AS _gen_alias_39#39, value#11]
      +- *(1) Filter ((isnotnull(window#34) AND (cast(time#10 as timestamp) >= window#34.start)) AND (cast(time#10 as timestamp) < window#34.end))
         +- *(1) Expand [List(named_struct(start, precisetimestampcon...
      

      Attachments

        Activity

          People

            maropu Takeshi Yamamuro
            maropu Takeshi Yamamuro
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: