Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-40849

Async log purge

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.4.0
    • 3.4.0
    • Structured Streaming
    • None

    Description

      Purging old entries in both the offset log and commit log will be done asynchronously.

       

      For every micro-batch, older entries in both offset log and commit log are deleted. This is done so that the offset log and commit log do not continually grow.  Please reference logic here

       

      https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala#L539 

       

      The time spent performing these log purges is grouped with the “walCommit” execution time in the StreamingProgressListener metrics.  Around two thirds of the “walCommit” execution time is performing these purge operations thus making these operations asynchronous will also reduce latency.  Also, we do not necessarily need to perform the purges every micro-batch.  When these purges are executed asynchronously, they do not need to block micro-batch execution and we don’t need to start another purge until the current one is finished.  The purges can happen essentially in the background.  We will just have to synchronize the purges with the offset WAL commits and completion commits so that we don’t have concurrent modifications of the offset log and commit log.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            jerrypeng Boyang Jerry Peng
            jerrypeng Boyang Jerry Peng
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment