Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-7742

BigQuery File Loads to work well with load job size limits

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.16.0
    • Component/s: io-py-gcp
    • Labels:
      None

      Description

      Load jobs into BigQuery have a number of limitations: https://cloud.google.com/bigquery/quotas#load_jobs

       

      Currently, the python BQ sink implemented in `bigquery_file_loads.py` does not handle these limitations well. Improvements need to be made to the miplementation, to:

      • Decide to use temp_tables dynamically at pipeline execution
      • Add code to determine when a load job to a single destination needs to be partitioned into multiple jobs.
      • When this happens, then we definitely need to use temp_tables, in case one of the two load jobs fails, and the pipeline is rerun.

      Tanay, would you be able to look at this?

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                ttanay Tanay Tummalapalli
                Reporter:
                pabloem Pablo Estrada
              • Votes:
                1 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 5h
                  5h