Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-9078

Large Tarball Artifacts Should Use GCS Resumable Upload

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.17.0
    • Fix Version/s: 2.19.0
    • Component/s: runner-dataflow
    • Labels:
      None
    • Flags:
      Patch

      Description

      It's possible for the tarball uploaded to GCS to be quite large. An example is a user vendoring multiple dependencies in their tarball so as to achieve a more stable deployable artifact.

      Before this change the GCS upload api call executed a multipart upload, which Google [documentation](https://cloud.google.com/storage/docs/json_api/v1/how-tos/upload) states should be used when the file is small enough to upload again when the connection fails. For large tarballs, we will hit 60 second socket timeouts before completing the multipart upload. By passing `total_size`, apitools first checks if the size exceeds the resumable upload threshold, and executes the more robust resumable upload rather than a multipart, avoiding
      socket timeouts.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                bread Brad West
                Reporter:
                bread Brad West
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - 1h
                  1h
                  Remaining:
                  Time Spent - 40m Remaining Estimate - 20m
                  20m
                  Logged:
                  Time Spent - 40m Remaining Estimate - 20m
                  40m