Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-41665

Spark streaming query scheduling synchronisation with Trigger Interval

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.4.8, 3.0.3, 3.1.2, 3.2.2, 3.3.1
    • None
    • Structured Streaming
    • None

    Description

      Hi,
      We detect a strange behavior on spark streaming when we set a trigger interval for example at 1 minutes all query will start at 0:00:00 0:01:00 0:02:00 no matter the start time of the query.
      So all query are "sync", so it's can disturbed a cluster a cluster i do leads to spike of utilisation 

      For me the expected behavior should be like this

       

      It's because of this line 
      https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/TriggerExecutor.scala#L98

      as now in intervalMS are long now / intervalMs * intervalMs will just cut in my case the second, as it's explicitely like this on the test (https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/ProcessingTimeExecutorSuite.scala#L36)  i do not know if it's the expected behavior or it's juste because this line it's here since 6 years. So it's affecting all versions since 6 years. 

      Regards

      Thomas

       

      Attachments

        1. image-2022-12-21-07-57-32-654.png
          8 kB
          Thomas Prelle
        2. image-2022-12-21-07-57-18-679.png
          14 kB
          Thomas Prelle

        Activity

          People

            Unassigned Unassigned
            tprelle-ubi Thomas Prelle
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: