[FLINK-9586] Make collection sources parallelisable - ASF JIRA

Attach files

Attach Screenshot

Add vote

Voters

Watch issue

Watchers

Create sub-task

Link

Clone

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Not a Priority
Resolution: Unresolved
Affects Version/s: 1.5.0
Fix Version/s: None
Component/s: API / DataSet
Labels:

Description

The note in https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/datastream_api.html#collection-data-sources

states that Collecitons are mainly there for testing and do not support parallelism. I believe this to be an unnecessary assumption - I'm sure there are plenty of use cases that already have the data they need to distribute ready at hand. It seems strange that a fixed collection of inputs cannot be parallelised by Flink, which would require users to write their Collections into a text file and re-read them just to get parallelisation.