Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-13843

Unify and clean up StreamingFileSink format builders

    XMLWordPrintableJSON

Details

    Description

      I think the StreamingFileSink contains some problems that will affect us in the long-run if we intend this sink to be the main exactly-once FS sink.

      1. Code duplication

      The StreamingFileSink currently has 2 builders for row and bulk formats:

      RowFormatBuilder, BulkFormatBuilder

      They both contain almost exactly the same config settings with a lot of code duplication that should be moved to a common superclass (StreamingFileSink.BucketsBuilder). 

      2. Inconsistent config options

      I also noticed some strange/invalid configuration settings for the builders:

       - RowFormatBuilder#withBucketAssignerAndPolicy : feels like an internal method that is not used anywhere. It also overwrites the bucket factory

      - BulkFormatBuilder#withBucketAssigner : takes an extra type parameter compared to the row format for the bucket ID type

      -  BulkFormatBuilder#withBucketCheckInterval : does not affect behavior as it always uses the OnCheckpointRollingPolicy

      This can probably solved by fixing the code duplication

      3. Fragmented configuration

      This is not a big problem but only affects the part file config options that were introduced recently. We have added 2 methods: withPartFilePrefix and withPartFileSuffix

      I think we should aim to group configs that belong together -> withPartFileConfig

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            gyfora Gyula Fora
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 40m
                40m