Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-24815

Structured Streaming should support dynamic allocation

    XMLWordPrintableJSON

    Details

      Description

      For batch jobs, dynamic allocation is very useful for adding and removing containers to match the actual workload. On multi-tenant clusters, it ensures that a Spark job is taking no more resources than necessary. In cloud environments, it enables autoscaling.

      However, if you set spark.dynamicAllocation.enabled=true and run a structured streaming job, the batch dynamic allocation algorithm kicks in. It requests more executors if the task backlog is a certain size, and removes executors if they idle for a certain period of time.

      Quick thoughts:

      1) Dynamic allocation should be pluggable, rather than hardcoded to a particular implementation in SparkContext.scala (this should be a separate JIRA).

      2) We should make a structured streaming algorithm that's separate from the batch algorithm. Eventually, continuous processing might need its own algorithm.

      3) Spark should print a warning if you run a structured streaming job when Core's dynamic allocation is enabled

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              Karthik Palaniappan Karthik Palaniappan
            • Votes:
              2 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated: