Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-13852

Support storing in-progress/pending files in different directories (StreamingFileSink)

    XMLWordPrintableJSON

Details

    Description

      Currently in-progress and pending files are stored in the same directory as the final output file. This can be problematic depending on the usage of the final output files. One example would be loading the data to hive where we can only load all files in a certain directory.

      I suggest we allow specifying a Pending/Inprogress base path where we create the same bucketing structure as the final files to store only the non-final files.

      To support this we need to extend the RecoverableWriter interface with a new open method for example:

      RecoverableFsDataOutputStream open(Path path, Path tmpPath) throws IOException;

      Attachments

        Activity

          People

            Unassigned Unassigned
            gyfora Gyula Fora
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: