Details
-
New Feature
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.18.0
-
None
-
None
Description
Currently, Flink's FileSource uses LocalityAwareSplitAssigner as a default FileSplitAssigner and it doesn't guarantee any order. In many scenarios involving processing historical data, reading files in order can be a requirement, especially when using event-time processing.
I believe a new FileSplitAssigner should be implemented that supports ordering. FileSourceBuilder should be extended to allow choosing a different FileSplitAssigner.
It's also clear that the files may not be read in perfect order with parallelism > 1. However, in some cases, using parallelism of 1 might be fine.