Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-11116

Overwrite outdated in-progress files in StreamingFileSink.

    XMLWordPrintableJSON

Details

    Description

      In order to guarantee exactly-once semantics, the streaming file sink is implementing a two-phase commit protocol when writing files to the filesystem.

      Initially data is written to in-progress files. These files are then put into "pending" state when they are completed (based on the rolling policy), and they are finally committed when the checkpoint that put them in the "pending" state is acknowledged as complete.

      The above shows that in the case that we have:
      1) checkpoints A, B, C coming
      2) checkpoint A being acknowledged and
      3) failure

      Then we may have files that do not belong to any checkpoint (because B and C were not considered successful). These files are currently not cleaned up.

      In order to reduce the amount of such files created, we removed the random suffix from in-progress temporary files, so that the next in-progress file that is opened for this part, overwrites them.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              kkl0u Kostas Kloudas
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m