Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-7742

BigQuery File Loads to work well with load job size limits

Details

    • Improvement
    • Status: Triage Needed
    • P2
    • Resolution: Fixed
    • None
    • 2.16.0
    • io-py-gcp
    • None

    Description

      Load jobs into BigQuery have a number of limitations: https://cloud.google.com/bigquery/quotas#load_jobs

       

      Currently, the python BQ sink implemented in `bigquery_file_loads.py` does not handle these limitations well. Improvements need to be made to the miplementation, to:

      • Decide to use temp_tables dynamically at pipeline execution
      • Add code to determine when a load job to a single destination needs to be partitioned into multiple jobs.
      • When this happens, then we definitely need to use temp_tables, in case one of the two load jobs fails, and the pipeline is rerun.

      Tanay, would you be able to look at this?

      Attachments

        Issue Links

          Activity

            People

              ttanay Tanay Tummalapalli
              pabloem Pablo Estrada
              Votes:
              1 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 5h
                  5h

                  Slack

                    Issue deployment