Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-7411

Allow upload gzipped files via apache_beam.io.gcp.gcsio.GcsIO with proper content-encoding

Details

    • Improvement
    • Status: Open
    • P3
    • Resolution: Unresolved
    • None
    • None
    • io-py-gcp
    • None

    Description

      To reduce the size of uploaded files we decided to gzip it before upload. Unfortunately, we noticed that we don't have content-encoding 'gzip' in the uploaded files metadata. I rechecked the code and noticed that there is no way to pass gzip encoding on

      apache_beam.io.gcp.gcsio.GcsIO.open()

      Also, I noticed that apache_beam.io.gcp.gcsio.GcsUploader doesn't support uploading for gzipped files.

      To resolve this problem we need to allow pass gzip_encoded option, which can be passed to apitools.base.py.transfer on

      GcsUploader.__init__()
      

      Is there any possibility that you apply the required changes soon?

      What steps to reproduce the problem?
      1. Prepare gzip encoded file for example pdf
      2. Upload it to GCS using

      from apache_beam.io.gcp.gcsio import GcsIO
      
      def upload_gzipped_pdf(gzipped_pdf, path)
        with GcsIO().open(path, 'w') as f:
          f.write(gzipped_pdf)
      

      3. Try to download uploaded file via browser

      What is the expected result?
      I see the file content properly

      What happens instead?
      I have a broken document

       

      Possible resolution after implementing expected changes

      from apache_beam.io.gcp.gcsio import GcsIO
      
      def upload_gzipped_pdf(gzipped_pdf, path)
        with GcsIO().open(path, 'w', gzip_encoded=True) as f:
          f.write(gzipped_pdf)
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            p35 Pavlo Zhukov
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: