Details
-
Bug
-
Status: Resolved
-
P1
-
Resolution: Fixed
-
2.35.0, 2.36.0
-
None
Description
Note that BQ does not support requests over a certain size, and rows that go past the size may be output into a dead-letter queue that they can get back with BigQueryIO.Write.Result.getFailedInsertsWithErr
A change went into Beam that outputs rows into the BQIO DLQ even if they're meant to be retried indefinitely.
https://github.com/apache/beam/commit/1f08d1f3ddc2e7bc7341be4b29bdafaec18de9cc#diff-26dbe8f625f702ae3edacdbc02b12acc6e423542fe16835229e22ef8eb4e109cR979-R989
A workaround is to set this pipeline option to a larger amount: https://github.com/apache/beam/blob/1f08d1f3ddc2e7bc7341be4b29bdafaec18de9cc/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryOptions.java#L70
Currently it's 64KB, which is relatively small. Setting it to 1MB or 5MB or so should work around this issue (it should be larger than the maximum row size) - gRPC should support up to 10MB request sizes.