Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-31285

FileSource should support reading files in order

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.18.0
    • None
    • None

    Description

      Currently, Flink's FileSource uses LocalityAwareSplitAssigner as a default FileSplitAssigner and it doesn't guarantee any order. In many scenarios involving processing historical data, reading files in order can be a requirement, especially when using event-time processing. 

      I believe a new FileSplitAssigner should be implemented that supports ordering. FileSourceBuilder should be extended to allow choosing a different FileSplitAssigner.

      It's also clear that the files may not be read in perfect order with parallelism > 1. However, in some cases, using parallelism of 1 might be fine.

      Attachments

        Activity

          People

            Unassigned Unassigned
            sap1ens Yaroslav Tkachenko
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: