Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
2.4.3
-
None
-
None
Description
Hi,
We are using Spark structured streaming to consume data from a Kafka cluster, transform it, and write it in an S3 Objectstore Service. Currently our Spark metadata compact files have 4,046,119 lines (1.2GB), and when a compact file gets written there is a delay with the batch processing - the batches are processing 1.8-2 minutes, when it usually takes 10-15 seconds. This delay in the processing generates lag in Kafka, which is a significant problem for us. Is this a known issue and if it is, are there any plans for fixing it in the future?