Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-3516

SpannerWriteGroupFn does not respect mutation limits

Details

    • Bug
    • Status: Resolved
    • P2
    • Resolution: Fixed
    • 2.2.0
    • 2.9.0
    • runner-dataflow
    • None

    Description

      When using SpannerIO.write(), if it happens to be a large batch or a table with indexes its very possible it can hit the Spanner Mutations Limitation and fail with the following error:

      Jan 02, 2018 2:42:59 PM org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
      SEVERE: 2018-01-02T22:42:57.873Z: (3e7c871d215e890b): com.google.cloud.spanner.SpannerException: INVALID_ARGUMENT: io.grpc.StatusRuntimeException: INVALID_ARGUMENT: The transaction contains too many mutations. Insert and update operations count with the multiplicity of the number of columns they affect. For example, inserting values into one key column and four non-key columns count as five mutations total for the insert. Delete and delete range operations count as one mutation regardless of the number of columns affected. The total mutation count includes any changes to indexes that the transaction generates. Please reduce the number of writes, or use fewer indexes. (Maximum number: 20000)
      links

      Unknown macro: { description}

      at com.google.cloud.spanner.SpannerExceptionFactory.newSpannerExceptionPreformatted(SpannerExceptionFactory.java:119)
      at com.google.cloud.spanner.SpannerExceptionFactory.newSpannerException(SpannerExceptionFactory.java:43)
      at com.google.cloud.spanner.SpannerExceptionFactory.newSpannerException(SpannerExceptionFactory.java:80)
      at com.google.cloud.spanner.spi.v1.GrpcSpannerRpc.get(GrpcSpannerRpc.java:404)
      at com.google.cloud.spanner.spi.v1.GrpcSpannerRpc.commit(GrpcSpannerRpc.java:376)
      at com.google.cloud.spanner.SpannerImpl$SessionImpl$2.call(SpannerImpl.java:729)
      at com.google.cloud.spanner.SpannerImpl$SessionImpl$2.call(SpannerImpl.java:726)
      at com.google.cloud.spanner.SpannerImpl.runWithRetries(SpannerImpl.java:200)
      at com.google.cloud.spanner.SpannerImpl$SessionImpl.writeAtLeastOnce(SpannerImpl.java:725)
      at com.google.cloud.spanner.SessionPool$PooledSession.writeAtLeastOnce(SessionPool.java:248)
      at com.google.cloud.spanner.DatabaseClientImpl.writeAtLeastOnce(DatabaseClientImpl.java:37)
      at org.apache.beam.sdk.io.gcp.spanner.SpannerWriteGroupFn.flushBatch(SpannerWriteGroupFn.java:108)
      at org.apache.beam.sdk.io.gcp.spanner.SpannerWriteGroupFn.processElement(SpannerWriteGroupFn.java:79)

       

      As a workaround we can override the "withBatchSizeBytes" to something much smaller:

      mutations.apply("Write", SpannerIO
         .write()
         // Artificially reduce the max batch size b/c the batcher currently doesn't
         // take into account the 20000 mutation multiplicity limit
         .withBatchSizeBytes(1024) // 1KB
         .withProjectId("#PROJECTID#")
         .withInstanceId("#INSTANCE#")
         .withDatabaseId("#DATABASE#")
      );

      While this is not as efficient, it at least allows it to work consistently

      Attachments

        Activity

          People

            nielm Niel Markwick
            ryangordon Ryan Gordon
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 5h 20m
                5h 20m