[BEAM-3516] SpannerWriteGroupFn does not respect mutation limits - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: P2
Resolution: Fixed
Affects Version/s: 2.2.0
Fix Version/s: 2.9.0
Component/s: runner-dataflow
Labels:
None

Description

When using SpannerIO.write(), if it happens to be a large batch or a table with indexes its very possible it can hit the Spanner Mutations Limitation and fail with the following error:

Jan 02, 2018 2:42:59 PM org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
SEVERE: 2018-01-02T22:42:57.873Z: (3e7c871d215e890b): com.google.cloud.spanner.SpannerException: INVALID_ARGUMENT: io.grpc.StatusRuntimeException: INVALID_ARGUMENT: The transaction contains too many mutations. Insert and update operations count with the multiplicity of the number of columns they affect. For example, inserting values into one key column and four non-key columns count as five mutations total for the insert. Delete and delete range operations count as one mutation regardless of the number of columns affected. The total mutation count includes any changes to indexes that the transaction generates. Please reduce the number of writes, or use fewer indexes. (Maximum number: 20000)
links

Unknown macro: { description}

at com.google.cloud.spanner.SpannerExceptionFactory.newSpannerExceptionPreformatted(SpannerExceptionFactory.java:119)
at com.google.cloud.spanner.SpannerExceptionFactory.newSpannerException(SpannerExceptionFactory.java:43)
at com.google.cloud.spanner.SpannerExceptionFactory.newSpannerException(SpannerExceptionFactory.java:80)
at com.google.cloud.spanner.spi.v1.GrpcSpannerRpc.get(GrpcSpannerRpc.java:404)
at com.google.cloud.spanner.spi.v1.GrpcSpannerRpc.commit(GrpcSpannerRpc.java:376)
at com.google.cloud.spanner.SpannerImpl$SessionImpl$2.call(SpannerImpl.java:729)
at com.google.cloud.spanner.SpannerImpl$SessionImpl$2.call(SpannerImpl.java:726)
at com.google.cloud.spanner.SpannerImpl.runWithRetries(SpannerImpl.java:200)
at com.google.cloud.spanner.SpannerImpl$SessionImpl.writeAtLeastOnce(SpannerImpl.java:725)
at com.google.cloud.spanner.SessionPool$PooledSession.writeAtLeastOnce(SessionPool.java:248)
at com.google.cloud.spanner.DatabaseClientImpl.writeAtLeastOnce(DatabaseClientImpl.java:37)
at org.apache.beam.sdk.io.gcp.spanner.SpannerWriteGroupFn.flushBatch(SpannerWriteGroupFn.java:108)
at org.apache.beam.sdk.io.gcp.spanner.SpannerWriteGroupFn.processElement(SpannerWriteGroupFn.java:79)

As a workaround we can override the "withBatchSizeBytes" to something much smaller:

mutations.apply("Write", SpannerIO
.write()
// Artificially reduce the max batch size b/c the batcher currently doesn't
// take into account the 20000 mutation multiplicity limit
.withBatchSizeBytes(1024) // 1KB
.withProjectId("#PROJECTID#")
.withInstanceId("#INSTANCE#")
.withDatabaseId("#DATABASE#")
);

While this is not as efficient, it at least allows it to work consistently

Attachments

Issue Links

links to

GitHub Pull Request #4860

GitHub Pull Request #5297

GitHub Pull Request #6478

Activity

People

Assignee:: Niel Markwick

Reporter:: Ryan Gordon

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 23/Jan/18 01:07

Updated:: 16/May/20 13:51

Resolved:: 26/Oct/18 08:52

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

5h 20m