Details
-
Improvement
-
Status: Open
-
P3
-
Resolution: Unresolved
-
None
-
None
-
None
Description
To reduce the size of uploaded files we decided to gzip it before upload. Unfortunately, we noticed that we don't have content-encoding 'gzip' in the uploaded files metadata. I rechecked the code and noticed that there is no way to pass gzip encoding on
apache_beam.io.gcp.gcsio.GcsIO.open()
Also, I noticed that apache_beam.io.gcp.gcsio.GcsUploader doesn't support uploading for gzipped files.
To resolve this problem we need to allow pass gzip_encoded option, which can be passed to apitools.base.py.transfer on
GcsUploader.__init__()
Is there any possibility that you apply the required changes soon?
What steps to reproduce the problem?
1. Prepare gzip encoded file for example pdf
2. Upload it to GCS using
from apache_beam.io.gcp.gcsio import GcsIO def upload_gzipped_pdf(gzipped_pdf, path) with GcsIO().open(path, 'w') as f: f.write(gzipped_pdf)
3. Try to download uploaded file via browser
What is the expected result?
I see the file content properly
What happens instead?
I have a broken document
Possible resolution after implementing expected changes
from apache_beam.io.gcp.gcsio import GcsIO def upload_gzipped_pdf(gzipped_pdf, path) with GcsIO().open(path, 'w', gzip_encoded=True) as f: f.write(gzipped_pdf)