Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-33606

Delay in batch processing when creating Spark metadata compact file

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.4.3
    • None
    • Structured Streaming
    • None

    Description

      Hi,

      We are using Spark structured streaming to consume data from a Kafka cluster, transform it, and write it in an S3 Objectstore Service. Currently our Spark metadata compact files have 4,046,119 lines (1.2GB), and when a compact file gets written there is a delay with the batch processing - the batches are processing 1.8-2 minutes, when it usually takes 10-15 seconds. This delay in the processing generates lag in Kafka, which is a significant problem for us. Is this a known issue and if it is, are there any plans for fixing it in the future?

      Attachments

        Activity

          People

            Unassigned Unassigned
            stelast Stela Stefanova
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: