Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-10168

Add FileFilter interface and FileModTimeFilter which sets a read start position for files by modification time

    XMLWordPrintableJSON

Details

    Description

      Update: The motivation is 1) enabling users to set a read start position for files, so they can process files that are modified after a given timestamp 2) expose more file information to users and providing them with a more flexible file filter interface to define their own filtering rules

      ---------------

      support filtering files by modified/created time in StreamExecutionEnvironment.readFile()

      for example, in a source dir with lots of file, we only want to read files that is created or modified after a specific time.

      This API can expose a generic filter function of files, and let users define filtering rules. Currently Flink only supports filtering files by path. What this means is that, currently the API is FileInputFormat.setFilesFilters(PathFiter) that takes only one file path filter. A more generic API that can take more filters can look like this 1) FileInputFormat.setFilesFilters(List (PathFiter, ModifiedTileFilter, ... ))

      2) or FileInputFormat.setFilesFilters(FileFiter), and FileFilter exposes all file attributes that Flink's file system can provide, like path and modified time

      I lean towards the 2nd option, because it gives users more flexibility to define complex filtering rules based on combinations of file attributes.

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            phoenixjiangnan Bowen Li
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 20m
                20m