Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-35421

Remove redundant ProjectExec from streaming queries with V2Relation

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.2.0
    • 3.2.0
    • Structured Streaming
    • None

    Description

      Streaming queries with V2Relation can have redundant ProjectExec in it's physical plan.
      You can easily reproduce with the following code.

      import org.apache.spark.sql.streaming.Trigger
      
      val query = spark.
        readStream.
        format("rate").
        option("rowsPerSecond", 1000).
        option("rampUpTime", "10s").
        load().
        selectExpr("timestamp", "100",  "value").  
        writeStream.
        format("console").
        trigger(Trigger.ProcessingTime("5 seconds")).
        // trigger(Trigger.Continuous("5 seconds")). // You can reproduce with continuous processing too.
        outputMode("append").
        start()
      

      Attachments

        Activity

          People

            sarutak Kousuke Saruta
            sarutak Kousuke Saruta
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: