Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-5789

Make Bucketing Sink independent of Hadoop's FileSystem

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Won't Do
    • Affects Version/s: 1.1.4, 1.2.0
    • Fix Version/s: None
    • Component/s: Connectors / Common
    • Labels:
      None

      Description

      The BucketingSink is hard wired to Hadoop's FileSystem, bypassing Flink's file system abstraction.

      This causes several issues:

      • The bucketing sink will behave different than other file sinks with respect to configuration
      • Directly supported file systems (not through hadoop) like the MapR File System does not work in the same way with the BuketingSink as other file systems
      • The previous point is all the more problematic in the effort to make Hadoop an optional dependency and with in other stacks (Mesos, Kubernetes, AWS, GCE, Azure) with ideally no Hadoop dependency.

      We should port the BucketingSink to use Flink's FileSystem classes.

      To support the truncate functionality that is needed for the exactly-once semantics of the Bucketing Sink, we should extend Flink's FileSystem abstraction to have the methods

      • boolean supportsTruncate()
      • void truncate(Path, long)

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                sewen Stephan Ewen
              • Votes:
                0 Vote for this issue
                Watchers:
                12 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: