Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-17045

[C++] Reject trailing slashes on file path

    XMLWordPrintableJSON

Details

    Description

      We had several different behaviors when passing in file paths with trailing slashes: LocalFileSystem would return IOError, S3 would trim off the trailing slash, and GCS would keep the trailing slash as part of the file name (later creating confusion as the file would be labelled a "directory" in list calls). This PR moves them all to the behavior of LocalFileSystem: return IOError.

      The R filesystem bindings relied on the behavior provided by S3, so they are now modified to trim the trailing slash before passing down to C++.

      Here is an example of the differences in behavior between S3 and GCS:

      import pyarrow.fs
      from pyarrow.fs import FileSelector
      from datetime import timedelta
      
      gcs = pyarrow.fs.GcsFileSystem(
          endpoint_override="localhost:9001",
          scheme="http",
          anonymous=True,
          retry_time_limit=timedelta(seconds=1),
      )
      
      gcs.create_dir("py_test")
      
      # Writing to test.txt with and without slash produces a file and a directory!?
      with gcs.open_output_stream("py_test/test.txt") as out_stream:
          out_stream.write(b"Hello world!")
      with gcs.open_output_stream("py_test/test.txt/") as out_stream:
          out_stream.write(b"Hello world!")
      gcs.get_file_info(FileSelector("py_test"))
      # [<FileInfo for 'py_test/test.txt': type=FileType.File, size=12>, <FileInfo for 'py_test/test.txt': type=FileType.Directory>]
      
      s3 = pyarrow.fs.S3FileSystem(
          access_key="minioadmin",
          secret_key="minioadmin",
          scheme="http",
          endpoint_override="localhost:9000",
          allow_bucket_creation=True,
          allow_bucket_deletion=True,
      )
      
      s3.create_dir("py-test")
      
      # Writing to test.txt with and without slash writes to same file
      with s3.open_output_stream("py-test/test.txt") as out_stream:
          out_stream.write(b"Hello world!")
      with s3.open_output_stream("py-test/test.txt/") as out_stream:
          out_stream.write(b"Hello world!")
      s3.get_file_info(FileSelector("py-test"))
      # [<FileInfo for 'py-test/test.txt': type=FileType.File, size=12>]
      

      Attachments

        Issue Links

          Activity

            People

              wjones127 Will Jones
              wjones127 Will Jones
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 4h 50m
                  4h 50m