Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-9749 Rework Bucketing Sink
  3. FLINK-10963

Cleanup small objects uploaded to S3 as independent objects

Attach filesAttach ScreenshotVotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      The S3 RecoverableWriter uses the Multipart Upload (MPU) Feature of S3 in order to upload the different part files. This means that a large part is split in chunks of at least 5MB which are uploaded independently, whenever each one of them is ready.

      This 5MB minimum size requires special handling of parts that are less than 5MB when a checkpoint barrier arrives. These small files are uploaded as independent objects (not associated with an active MPU). This way, when Flink needs to restore, it simply downloads them and resumes writing to them.

      These small objects are currently not cleaned up, thus leading to wasted space on S3.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            kkl0u Kostas Kloudas
            kkl0u Kostas Kloudas
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment