Description
Load jobs into BigQuery have a number of limitations: https://cloud.google.com/bigquery/quotas#load_jobs
Currently, the python BQ sink implemented in `bigquery_file_loads.py` does not handle these limitations well. Improvements need to be made to the miplementation, to:
- Decide to use temp_tables dynamically at pipeline execution
- Add code to determine when a load job to a single destination needs to be partitioned into multiple jobs.
- When this happens, then we definitely need to use temp_tables, in case one of the two load jobs fails, and the pipeline is rerun.
Tanay, would you be able to look at this?
Attachments
Issue Links
- links to