Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-14364

404s in BigQueryIO don't get output to Failed Inserts PCollection

Details

    • Bug
    • Status: Triage Needed
    • P1
    • Resolution: Unresolved
    • None
    • None
    • io-py-gcp

    Description

      Given that BigQueryIO is configured to use createDisposition(CREATE_NEVER),
      and the DynamicDestinations class returns "null" for a schema,
      and the table for that destination does not exist in BigQuery, When I stream records to BigQuery for that table, then the write should fail,
      and the failed rows should appear on the output PCollection for Failed Inserts (via getFailedInserts().
       
      Almost all of the time, the table exists before hand, but given that new tables can be created, we want this behavior to be non-explosive to the Job, however, what we are seeing is that processing completely stops in those pipelines, and eventually the jobs run out of memory. I feel that the appropriate action when BigQuery 404's for the table, would be to submit those failed TableRows to the output PCollection and continue processing as normal.

      Attachments

        1. ErrorsInPrototypeJob.PNG
          18 kB
          Darren Norton

        Activity

          People

            svetak Svetak Sundhar
            svetak Svetak Sundhar
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated: