Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-13158

"Dead letter" handling for problem rows in BigQueryIO Storage Write API

Details

    • New Feature
    • Status: Open
    • P2
    • Resolution: Unresolved
    • None
    • None
    • io-java-gcp
    • None

    Description

      A single invalid row causes the BigQueryIO transform and the whole pipeline to fail. The desired behavior would be to allow control of the error handling - either fail on any validation failure (current behavior) or return the list of failed records through the WriteResult. 

      There are two places where the exception occurs - Json to protobuf conversion and the BigQuery backend. 

      Example of the exception caused by the conversion:

      io.grpc.StatusRuntimeException: INVALID_ARGUMENT: The proto field mismatched with BigQuery field at D586b3f9a_1543_4dbe_87ff_ef786d6803c2.bytes_sent, the proto field type string, BigQuery field type INTEGER Entity: projects/event-processing-demo/datasets/bigquery_io/tables/events/streams/Cic2MzUyMTYxYy0wMDAwLTI2MjktOGVjYy1mNDAzMDQ1ZWY5Y2U6czI
      

      Example of the exception caused by the BigQuery backend: 

      io.grpc.StatusRuntimeException: INVALID_ARGUMENT: Field dst_ip: STRING(15) has maximum length 15 but got a value with length 54 Entity: projects/event-processing-demo/datasets/bigquery_io/tables/events/streams/CiQ2MzRkOGM5Mi0wMDAwLTI2MjktOGVjYy1mNDAzMDQ1ZWY5Y2U
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            slilichenko Sergei Lilichenko
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: