Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-37458

Remove unnecessary object serialization on foreachBatch

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.3.0
    • 3.3.0
    • Structured Streaming
    • None

    Description

      Currently, ForeachBatchSink leverages ExternalRDD with converting RDD[InternalRow] to RDD[T], to provide Dataset[T] to the user function. This adds SerializeFromObject in the plan, which is actually not required.

      We can leverage LogicalRDD instead, to remove SerializeFromObject from the plan.

      Attachments

        Activity

          People

            kabhwan Jungtaek Lim
            kabhwan Jungtaek Lim
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: