Details
-
Improvement
-
Status: Open
-
P3
-
Resolution: Unresolved
-
None
-
None
-
None
Description
The other SDKs have implementations that shard files on write. So should the Go SDK. The feature is mentioned in the Beam Programming Guide:
https://beam.apache.org/documentation/programming-guide/#file-based-writing-multiple-files
It would be expedient to provide an Xlang TextIO implementation for the Go SDK compared to replicating the implementation in Go, at cost of some execution time performance.
Ideally it would be similarly generalized to simplify writing File Sinks. File sinks are necessarily complex to provide a robust and reliable implementation
Current Go implementation.
https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/io/textio/textio.go#L119
Python FileIO implementation:
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/filebasedsink.py
(Note iobase.Sink is deprecated, but is still suitable for file io.)
Java TextIO & FileIO:
KafkaIO (example of writing Go SDK side wrapper for a xlang Java IO):
https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/io/xlang/kafkaio/kafka.go
General docs on writing sinks: https://beam.apache.org/documentation/io/developing-io-overview/#sinks