Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-12664

Improve textio: Write sharding

Details

    • Improvement
    • Status: Open
    • P3
    • Resolution: Unresolved
    • None
    • None
    • sdk-go
    • None

    Description

      The other SDKs have implementations that shard files on write. So should the Go SDK. The feature is mentioned in the Beam Programming Guide:

      https://beam.apache.org/documentation/programming-guide/#file-based-writing-multiple-files

      It would be expedient to provide an Xlang TextIO implementation for the Go SDK compared to replicating the implementation in Go, at cost of some execution time performance.

      Ideally it would be similarly generalized to simplify writing File Sinks.  File sinks are necessarily complex to provide a robust and reliable implementation

      Current Go implementation.

      https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/io/textio/textio.go#L119

      Python FileIO implementation:

      https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/filebasedsink.py 

      (Note iobase.Sink is deprecated, but is still suitable for file io.)

      Java TextIO & FileIO:

      https://github.com/apache/beam/blob/f8fbbfa309ac88848057de694d4cc1cba3eaa92a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TextIO.java#L1259 

      https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileIO.java 

       

      KafkaIO (example of writing Go SDK side wrapper for a xlang Java IO):

      https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/io/xlang/kafkaio/kafka.go 

       

      General docs on writing sinks: https://beam.apache.org/documentation/io/developing-io-overview/#sinks 

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            lostluck Robert Burke
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: