[HIVE-7768] Integrate with Spark executor scaling [Spark Branch] - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Critical
Resolution: Done
Affects Version/s: None
Fix Version/s: None
Component/s: Spark
Labels:
- TODOC-SPARK

Description

Scenario:
A user connects to Hive and runs a query on a small time. Our SC is sized for that small table. They then run a query on a much larger table. We'll need to "re-size" the SC which I don't think Spark supports today, so we need to research what is available today in Spark and how Tez works.

More details:
Similar to Tez, it's likely our "SparkContext" is going to be long lived and process many queries. Some queries will be large and some small. Additionally the SC might be idle for long periods of time.

In this JIRA we will research the following:

How Spark decides the number of slaves for a given RDD today
Given a SC when you create a new RDD based on a much larger input dataset, does the SC adjust?
How Tez increases/decreases the size of the running YARN application (set of slaves)
How Tez handles scenarios when it has a running set of slaves in YARN and requests more resources for a query and fails to get additional resources
How Tez decides to timeout idle slaves

This will guide requirements we'll need from Spark.

Attachments

Issue Links

relates to

HIVE-7516 Add capacity control over queries running on Spark cluster [Spark Branch]

Resolved

SPARK-3174 Provide elastic scaling within a Spark application

Closed

Activity

People

Assignee:: Chengxiang Li

Reporter:: Brock Noland

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 18/Aug/14 20:25

Updated:: 23/Dec/14 13:25

Resolved:: 23/Dec/14 13:25