Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-49259

Size based partition creation during kafka read

    XMLWordPrintableJSON

Details

    Description

      Currently Spark + kafka structured streaming provides minPartitions config to create more number of partitions than kafka has. This is helpful to increase parallelism but this value is can not be changed dynamically. 

      It would be better to dynamically increase spark partitions based on input size, if input size is high create more partitions. We can take avg msg size and maxBytesPerPartition as input and dynamically create partitions to handle varying loads.

      Attachments

        Issue Links

          Activity

            People

              subham_singhal Subham Singhal
              subham_singhal Subham Singhal
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: