Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-19589

Support per-connector FileSystem configuration

    XMLWordPrintableJSON

Details

    Description

      Currently, options for file systems can only be configured globally. However, in many cases, users would like to configure more fine-grained.

      Either we allow a properties map similar to Kafka or Kinesis properties to our connectors.

      Or something like:

      Management of two properties related S3 Object management:

      Being able to control these is useful for people who want to manage jobs using S3 for checkpointing or job output, but need to control per job level configuration of the tagging/lifecycle for the purposes of auditing or cost control (for example deleting old state from S3)

      Ideally, it would be possible to control this on each object being written by Flink, or at least at a job level.

      Note: Some related existing properties can be set using the hadoop module using system properties: see for example 

      fs.s3a.acl.default

      which sets the default ACL on written objects.

      Solutions:

      1) Modify hadoop module:

      The above-linked module could be updated in order to have a new property (and similar for lifecycle)
      fs.s3a.tags.default
      which could be a comma separated list of tags to set. For example

      fs.s3a.acl.default = "jobname:JOBNAME,owner:OWNER"

      This seems like a natural place to put this logic (and is outside of Flink if we decide to go this way. However it does not allow for a sink and checkpoint to have different values for these.

      2) Expose withTagging from module

      The hadoop module used by Flink's existing filesystem has already exposed put request level tagging (see this). This could be used in the Flink filesystem plugin to expose these options. A possible approach could be to somehow incorporate it into the file path, e.g.,

      path = "TAGS:s3://bucket/path"

       Or possible as an option that can be applied to the checkpoint and sink configurations, e.g.,

      env.getCheckpointingConfig().setS3Tags(TAGS) 

      and similar for a file sink.

      Note: The lifecycle can also be managed using the module: see here.

       

       

       

      Attachments

        1. FLINK-19589.patch
          4 kB
          Josh Mahonin

        Issue Links

          Activity

            People

              jmahonin Josh Mahonin
              Padarn Padarn Wilson
              Votes:
              4 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated: