Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
Description
S3 allows tagging data to better organize ones data (https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-tagging.html) We use this for efficient downstream processes/inventory management.
Currently arrow/pyarrow does not allow tags to be added on write. This is causing us to scan the bucket and re-apply the tags after a pyrrow based process has run.
I looked through the code and think that it could potentially be done via the metadata mechanism.
The tags need to be added to the CreateMultipartUploadRequest here: https://github.com/apache/arrow/blob/master/cpp/src/arrow/filesystem/s3fs.cc#L1156
See also