Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-13699

[Python][Doc] Refactor the FileSystem Interface documentation

    XMLWordPrintableJSON

Details

    Description

      As a python developer working with different cloud vendors and their storages I'd like to quickly jump to code examples on how to read and write files for each filesystem.

      The documentation concerned is the python/filesystem doc: https://arrow.apache.org/docs/python/filesystems.html

      I find the information is a bit scattered and could be improved by having the following organisation.

      Filesystem Interface

      overview of the Pyarrow FS Interface

      Usage

      Local Filesystem

      description

      Writing files

      code example

      Listing files

      code example

      Reading files

      code example

      S3 Filesystem

      description / configuration

      Writing files

      code example

      Listing files

      code example

      Reading files

      code example

      Hadoop Filesystem

      description / configuration

      Writing files

      code example

      Listing files

      code example

      Reading files

      code example

      Extending to fsspec-compatible filesystems

      description

      Google Cloud Storage

      code example

      Azure

      code example

      That way a developer working on s3 can directly jump to the section of interest and start experimenting with the code examples.
      Additionally if new python bindings are created for a "Arrow native" filesystem the documentation can be extended with a new section in same vein as the other.

      Attachments

        Issue Links

          Activity

            People

              Nlte Nathanael Leaute
              Nlte Nathanael Leaute
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2h
                  2h