[SPARK-49259] Size based partition creation during kafka read - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 4.0.0
Fix Version/s: 4.0.0
Component/s: Structured Streaming
Labels:
- pull-request-available

Description

Currently Spark + kafka structured streaming provides minPartitions config to create more number of partitions than kafka has. This is helpful to increase parallelism but this value is can not be changed dynamically.

It would be better to dynamically increase spark partitions based on input size, if input size is high create more partitions. We can take avg msg size and maxBytesPerPartition as input and dynamically create partitions to handle varying loads.

Attachments

Issue Links

links to

GitHub Pull Request #47927

Activity

People

Assignee:: Subham Singhal

Reporter:: Subham Singhal

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 16/Aug/24 09:04

Updated:: 17/Oct/24 07:02

Resolved:: 17/Oct/24 07:02