Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-15892

[C++] Dataset APIs require s3:ListBucket Permissions

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • 8.0.0
    • C++

    Description

      Hi team, first time posting an issue so I apologize if the format is lacking. My original comment is on ARROW-13685 Github Issue here

      Long story short, our environment is super locked down, and while my application has permission to write data against an s3 prefix, I do not have the ListBucket permission nor can I add it. This does not prevent me from using the "individual" file APIs like pq.write_table but the bucket validation logic in the "dataset" APIs breaks when trying to test for the bucket's existence. 

      pq.write_to_dataset(pa.Table.from_batches([data]), location, filesystem=s3fs)
      OSError: When creating bucket '<my bucket>': AWS Error [code 15]: Access Denied

      The same is true for the generic pyarrow.dataset APIs. My understanding is the bucket validation logic is part of the C++ code, not the Python API. As a Pythonista who knows nothing of C++ I am not sure how to resolve this problem.
       
      Would it be possible to disable the bucket existence check with an optional key word argument? Thank you for your time!
       

      Attachments

        Issue Links

          Activity

            People

              sanjibansg Sanjiban Sengupta
              mRWaffles Jonny Fuller
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 7h 40m
                  7h 40m