Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-24815

Structured Streaming should support dynamic allocation

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 2.3.1
    • Fix Version/s: None
    • Labels:
      None

      Description

      For batch jobs, dynamic allocation is very useful for adding and removing containers to match the actual workload. On multi-tenant clusters, it ensures that a Spark job is taking no more resources than necessary. In cloud environments, it enables autoscaling.

      However, if you set spark.dynamicAllocation.enabled=true and run a structured streaming job, the batch dynamic allocation algorithm kicks in. It requests more executors if the task backlog is a certain size, and removes executors if they idle for a certain period of time.

      Quick thoughts:

      1) Dynamic allocation should be pluggable, rather than hardcoded to a particular implementation in SparkContext.scala (this should be a separate JIRA).

      2) We should make a structured streaming algorithm that's separate from the batch algorithm. Eventually, continuous processing might need its own algorithm.

      3) Spark should print a warning if you run a structured streaming job when Core's dynamic allocation is enabled

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              Karthik Palaniappan Karthik Palaniappan
            • Votes:
              2 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated: