Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-24815

Structured Streaming should support dynamic allocation

    XMLWordPrintableJSON

Details

    Description

      For batch jobs, dynamic allocation is very useful for adding and removing containers to match the actual workload. On multi-tenant clusters, it ensures that a Spark job is taking no more resources than necessary. In cloud environments, it enables autoscaling.

      However, if you set spark.dynamicAllocation.enabled=true and run a structured streaming job, the batch dynamic allocation algorithm kicks in. It requests more executors if the task backlog is a certain size, and removes executors if they idle for a certain period of time.

      Quick thoughts:

      1) Dynamic allocation should be pluggable, rather than hardcoded to a particular implementation in SparkContext.scala (this should be a separate JIRA).

      2) We should make a structured streaming algorithm that's separate from the batch algorithm. Eventually, continuous processing might need its own algorithm.

      3) Spark should print a warning if you run a structured streaming job when Core's dynamic allocation is enabled

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              Karthik Palaniappan Karthik Palaniappan
              Votes:
              11 Vote for this issue
              Watchers:
              26 Start watching this issue

              Dates

                Created:
                Updated: