Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-11838

Create RecoverableWriter for GCS

    XMLWordPrintableJSON

Details

    Description

      GCS supports the resumable upload which we can use to create a Recoverable writer similar to the S3 implementation:
      https://cloud.google.com/storage/docs/json_api/v1/how-tos/resumable-upload

      After using the Hadoop compatible interface: https://github.com/apache/flink/pull/7519
      We've noticed that the current implementation relies heavily on the renaming of the files on the commit:
      https://github.com/apache/flink/blob/master/flink-filesystems/flink-hadoop-fs/src/main/java/org/apache/flink/runtime/fs/hdfs/HadoopRecoverableFsDataOutputStream.java#L233-L259
      This is suboptimal on an object store such as GCS. Therefore we would like to implement a more GCS native RecoverableWriter

      Attachments

        Issue Links

          Activity

            People

              galenwarren Galen Warren
              fokko Fokko Driesprong
              Votes:
              3 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m