Details
-
Bug
-
Status: Resolved
-
P2
-
Resolution: Fixed
-
2.2.0
-
None
Description
When using SpannerIO.write(), if it happens to be a large batch or a table with indexes its very possible it can hit the Spanner Mutations Limitation and fail with the following error:
Jan 02, 2018 2:42:59 PM org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
SEVERE: 2018-01-02T22:42:57.873Z: (3e7c871d215e890b): com.google.cloud.spanner.SpannerException: INVALID_ARGUMENT: io.grpc.StatusRuntimeException: INVALID_ARGUMENT: The transaction contains too many mutations. Insert and update operations count with the multiplicity of the number of columns they affect. For example, inserting values into one key column and four non-key columns count as five mutations total for the insert. Delete and delete range operations count as one mutation regardless of the number of columns affected. The total mutation count includes any changes to indexes that the transaction generates. Please reduce the number of writes, or use fewer indexes. (Maximum number: 20000)
linksUnknown macro: { description}at com.google.cloud.spanner.SpannerExceptionFactory.newSpannerExceptionPreformatted(SpannerExceptionFactory.java:119)
at com.google.cloud.spanner.SpannerExceptionFactory.newSpannerException(SpannerExceptionFactory.java:43)
at com.google.cloud.spanner.SpannerExceptionFactory.newSpannerException(SpannerExceptionFactory.java:80)
at com.google.cloud.spanner.spi.v1.GrpcSpannerRpc.get(GrpcSpannerRpc.java:404)
at com.google.cloud.spanner.spi.v1.GrpcSpannerRpc.commit(GrpcSpannerRpc.java:376)
at com.google.cloud.spanner.SpannerImpl$SessionImpl$2.call(SpannerImpl.java:729)
at com.google.cloud.spanner.SpannerImpl$SessionImpl$2.call(SpannerImpl.java:726)
at com.google.cloud.spanner.SpannerImpl.runWithRetries(SpannerImpl.java:200)
at com.google.cloud.spanner.SpannerImpl$SessionImpl.writeAtLeastOnce(SpannerImpl.java:725)
at com.google.cloud.spanner.SessionPool$PooledSession.writeAtLeastOnce(SessionPool.java:248)
at com.google.cloud.spanner.DatabaseClientImpl.writeAtLeastOnce(DatabaseClientImpl.java:37)
at org.apache.beam.sdk.io.gcp.spanner.SpannerWriteGroupFn.flushBatch(SpannerWriteGroupFn.java:108)
at org.apache.beam.sdk.io.gcp.spanner.SpannerWriteGroupFn.processElement(SpannerWriteGroupFn.java:79)
As a workaround we can override the "withBatchSizeBytes" to something much smaller:
mutations.apply("Write", SpannerIO
.write()
// Artificially reduce the max batch size b/c the batcher currently doesn't
// take into account the 20000 mutation multiplicity limit
.withBatchSizeBytes(1024) // 1KB
.withProjectId("#PROJECTID#")
.withInstanceId("#INSTANCE#")
.withDatabaseId("#DATABASE#")
);
While this is not as efficient, it at least allows it to work consistently