[FLINK-30773] Add API for rescaling of jobs based on per-vertex parallelism overrides - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.18.0
Component/s: Autoscaler, Runtime / Coordination, Runtime / REST
Labels:
None

Description

~~FLINK-29501~~ introduced a way to rescale jobs via a user-provided parallelism overrides map. This feature is already used today by the Autoscaler of the Flink Kubernetes operator. However, it requires a full restart of the Flink job and only supports the application deployment mode.

In a K8s environment, this is inefficient because all pods for a deployment will be surrendered. Upon restart, they have to be re-acquired. In addition to being slow, this can also lead to situations where resource constraints prevent a restart from executing properly.

Ideally, we would would want the following to happen on receiving a rescale request:

Rescale API receives a request with a parallelism overrides map (vertexId => parallelism) for a jobId
Compute the number of required task slots using the overrides and the current JobGraph
1. If the total number of task slots for the cluster is less than the required number of task slots of the rescale, acquire the missing task slots. Otherwise, do nothing
2. Wait for new task slots to become available
3. Abort rescale request on timeout
Redeploy the JobGraph / Tasks with the updated parallelisms
Surrender any unused task slots in case of scaling down

There is an existing "Rescaling" API which is currently disabled. We have to evaluate whether reusing it makes sense.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

meces.patch
23/Jan/23 16:59
767 kB
Pedro Cardoso Silva

Issue Links

is fixed by

FLINK-31316 FLIP-291: Externalized Declarative Resource Management

In Progress

is related to

FLINK-29501 Allow overriding JobVertex parallelisms during job submission

Resolved

FLINK-30260 FLIP-271: Add initial Autoscaling implementation

Closed

Activity

People

Assignee:: Maximilian Michels

Reporter:: Maximilian Michels

Votes:: 0 Vote for this issue

Watchers:: 14 Start watching this issue

Dates

Created:: 23/Jan/23 16:02

Updated:: 28/Apr/23 10:04

Resolved:: 28/Apr/23 10:04