Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-16746

[C++][Python] S3 tag support on write

Add voteWatch issue
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • C++, Python

    Description

      S3 allows tagging data to better organize ones data (https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-tagging.html) We use this for efficient downstream processes/inventory management.

      Currently arrow/pyarrow does not allow tags to be added on write. This is causing us to scan the bucket and re-apply the tags after a pyrrow based process has run.

      I looked through the code and think that it could potentially be done via the metadata mechanism.

      The tags need to be added to the CreateMultipartUploadRequest here: https://github.com/apache/arrow/blob/master/cpp/src/arrow/filesystem/s3fs.cc#L1156

      See also

      http://sdk.amazonaws.com/cpp/api/LATEST/class_aws_1_1_s3_1_1_model_1_1_create_multipart_upload_request.html#af791f34a65dc69bd681d6995313be2da

      Attachments

        Activity

          People

            Unassigned Unassigned
            fs111 André Kelpe

            Dates

              Created:
              Updated:

              Slack

                Issue deployment