Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-13931

BigQueryIO is sending rows that are too large to Deadletter Queue even on RETRY_ALWAYS

Details

    • Bug
    • Status: Resolved
    • P1
    • Resolution: Fixed
    • 2.35.0, 2.36.0
    • 2.37.0
    • io-java-gcp
    • None

    Description

      Note that BQ does not support requests over a certain size, and rows that go past the size may be output into a dead-letter queue that they can get back with BigQueryIO.Write.Result.getFailedInsertsWithErr

      A change went into Beam that outputs rows into the BQIO DLQ even if they're meant to be retried indefinitely.

      https://github.com/apache/beam/commit/1f08d1f3ddc2e7bc7341be4b29bdafaec18de9cc#diff-26dbe8f625f702ae3edacdbc02b12acc6e423542fe16835229e22ef8eb4e109cR979-R989
       
       
      A workaround is to set this pipeline option to a larger amount: https://github.com/apache/beam/blob/1f08d1f3ddc2e7bc7341be4b29bdafaec18de9cc/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryOptions.java#L70
       

      Currently it's 64KB, which is relatively small. Setting it to 1MB or 5MB or so should work around this issue (it should be larger than the maximum row size) - gRPC should support up to 10MB request sizes.

       

      Attachments

        Activity

          People

            pabloem Pablo Estrada
            pabloem Pablo Estrada
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 4h 20m
                4h 20m