Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-2264

Re-use credential instead of generating a new one one each GCS call

Details

    • Improvement
    • Status: Resolved
    • P3
    • Resolution: Fixed
    • None
    • 2.15.0
    • sdk-py-core
    • None

    Description

      We should cache the credential used within a Pipeline and re-use it instead of generating a new one on each GCS call. When executing (against 2.0.0 RC2):

      python -m apache_beam.examples.wordcount --input "gs://dataflow-samples/shakespeare/*" --output local_counts
      

      Note that we seemingly generate a new access token each time instead of when a refresh is required.

        super(GcsIO, cls).__new__(cls, storage_client))
      INFO:root:Starting the size estimation of the input
      INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
      INFO:oauth2client.client:Refreshing access_token
      INFO:root:Finished the size estimation of the input at 1 files. Estimation took 0.286200046539 seconds
      INFO:root:Running pipeline with DirectRunner.
      INFO:root:Starting the size estimation of the input
      INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
      INFO:oauth2client.client:Refreshing access_token
      INFO:root:Finished the size estimation of the input at 43 files. Estimation took 0.205624818802 seconds
      INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
      INFO:oauth2client.client:Refreshing access_token
      INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
      INFO:oauth2client.client:Refreshing access_token
      INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
      INFO:oauth2client.client:Refreshing access_token
      INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
      INFO:oauth2client.client:Refreshing access_token
      INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
      ... many more times ...
      

      Attachments

        Activity

          People

            udim Udi Meiri
            lcwik Luke Cwik
            Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 4h 40m
                4h 40m