Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-1153

GcsUtil needs to set timeout and retry explicitly in BatchRequest.

Details

    • Bug
    • Status: Resolved
    • P0
    • Resolution: Fixed
    • None
    • 0.4.0
    • sdk-java-core

    Description

      Non-batch requests uses RetryHttpRequestInitializer, which set read timeout as 80 seconds, and does more retries.

      Google Cloud auto generated Json library doesn't set HttpRequestInitializer for batch requests.

      GcsUtil uses storageClient.batch(), and it is defined in here:
      https://github.com/vparfonov/google-api-java-client/blob/master/google-api-client/src/main/java/com/google/api/client/googleapis/services/AbstractGoogleClient.java#L256

      Without the HttpRequestInitializer, the default read timeout is 20 seconds.

      Possible fix is: https://github.com/apache/incubator-beam/pull/1608

      In additional, we can partially rollback https://github.com/apache/incubator-beam/pull/1359 to keep using non-batch API for fileSize() for single files. This will make sure existing code will keep work as the same way.
      PR: https://github.com/apache/incubator-beam/pull/1611

      Attachments

        Activity

          People

            peihe0@gmail.com Pei He
            peihe0@gmail.com Pei He
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: