Details

    • New Feature
    • Status: Open
    • P3
    • Resolution: Unresolved
    • 2.11.0
    • None
    • io-java-gcp
    • None

    Description

      from a discussion in https://github.com/apache/beam/pull/8097
      SpannerIO produces 2 output PCollections:

      • getOutput() -> PCollection<Void>
        • never has any values
        • in GlobalWindow
        • Closed when the input PCollection is closed (ie never in streaming) to indicate when all input has been written
        • Used in batch pipelines to have 'dependant' bulk imports - where one dataset is not written to Spanner until another has completed writing. (necessary for handling parent/child (1-many) referential integrity)
      • getFailedMutations() -> PCollection<MutationGroup>
        • only contains values when Mutation[Group]s fail to be written
        • in GlobalWindow
        • Not very useful, as the reason for the failure is not given. 

      Suggestion: 

      • Deprecate these existing outputs.
      • Make primary output be a PCollection<{ MutationGroup, CommitTimestamp }> so that the successfully written Mutation[Groups] can be processed further if necessary.
        ({a,b} signifies a container class for these values)
      • Add an additional output of failed mutations PCollection<{ MutationGroup, FailureMessage}>
      • The existing outputs can be derived from these new outputs

      This allows useful error reporting/handling from the failure message, and the ability to continue processing the successful mutations. 

       

      (see also BEAM-6887)

       

       

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              nielm Niel Markwick
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: