Details
-
New Feature
-
Status: Open
-
P3
-
Resolution: Unresolved
-
2.11.0
-
None
-
None
Description
from a discussion in https://github.com/apache/beam/pull/8097
SpannerIO produces 2 output PCollections:
- getOutput() -> PCollection<Void>
- never has any values
- in GlobalWindow
- Closed when the input PCollection is closed (ie never in streaming) to indicate when all input has been written
- Used in batch pipelines to have 'dependant' bulk imports - where one dataset is not written to Spanner until another has completed writing. (necessary for handling parent/child (1-many) referential integrity)
- getFailedMutations() -> PCollection<MutationGroup>
- only contains values when Mutation[Group]s fail to be written
- in GlobalWindow
- Not very useful, as the reason for the failure is not given.
Suggestion:
- Deprecate these existing outputs.
- Make primary output be a PCollection<{ MutationGroup, CommitTimestamp }> so that the successfully written Mutation[Groups] can be processed further if necessary.
({a,b} signifies a container class for these values) - Add an additional output of failed mutations PCollection<{ MutationGroup, FailureMessage}>
- The existing outputs can be derived from these new outputs
This allows useful error reporting/handling from the failure message, and the ability to continue processing the successful mutations.
(see also BEAM-6887)
Attachments
Issue Links
- is duplicated by
-
BEAM-6887 Streaming Spanner Writer transform
- Resolved