[BEAM-7742] BigQuery File Loads to work well with load job size limits - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Triage Needed
Priority: P2
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.16.0
Component/s: io-py-gcp
Labels:
None

Description

Load jobs into BigQuery have a number of limitations: https://cloud.google.com/bigquery/quotas#load_jobs

Currently, the python BQ sink implemented in `bigquery_file_loads.py` does not handle these limitations well. Improvements need to be made to the miplementation, to:

Decide to use temp_tables dynamically at pipeline execution
Add code to determine when a load job to a single destination needs to be partitioned into multiple jobs.
When this happens, then we definitely need to use temp_tables, in case one of the two load jobs fails, and the pipeline is rerun.

Tanay, would you be able to look at this?

Attachments

Issue Links

links to

GitHub Pull Request #9242

Activity

People

Assignee:: Tanay Tummalapalli

Reporter:: Pablo Estrada

Votes:: 1 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 15/Jul/19 16:55

Updated:: 13/Apr/23 10:56

Resolved:: 04/Sep/19 19:59

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

Agile

View on Board

BigQuery File Loads to work well with load job size limits