Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-11838

Create RecoverableWriter for GCS

Agile BoardRank to TopRank to BottomAttach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskConvert to sub-taskLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

      Description

      GCS supports the resumable upload which we can use to create a Recoverable writer similar to the S3 implementation:
      https://cloud.google.com/storage/docs/json_api/v1/how-tos/resumable-upload

      After using the Hadoop compatible interface: https://github.com/apache/flink/pull/7519
      We've noticed that the current implementation relies heavily on the renaming of the files on the commit:
      https://github.com/apache/flink/blob/master/flink-filesystems/flink-hadoop-fs/src/main/java/org/apache/flink/runtime/fs/hdfs/HadoopRecoverableFsDataOutputStream.java#L233-L259
      This is suboptimal on an object store such as GCS. Therefore we would like to implement a more GCS native RecoverableWriter

        Attachments

        Issue Links

          Activity

          $i18n.getText('security.level.explanation', $currentSelection) Viewable by All Users
          Cancel

            People

            • Assignee:
              galenwarren Galen Warren
              Reporter:
              fokko Fokko Driesprong

              Dates

              • Created:
                Updated:

              Time Tracking

              Estimated:
              Original Estimate - Not Specified
              Not Specified
              Remaining:
              Remaining Estimate - 0h
              0h
              Logged:
              Time Spent - 20m
              20m

                Issue deployment