Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-1858

[Python] Add documentation about parquet.write_to_dataset and related methods

    Details

      Description

      pyarrow does not only allow one to write to a single Parquet file but you can also write only the schema metadata for a full multi-file dataset. This dataset can also be automatically partitioned by one or more columns. At the moment, this functionality is not really visible in the documentation. You mainly find the API documentation for it but we should have a small tutorial-like section that explains the differences and use cases for each of these functions.

      See also https://stackoverflow.com/questions/47482434/can-pyarrow-write-multiple-parquet-files-to-a-folder-like-fastparquets-file-sch

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                dsimmie Donal Simmie
                Reporter:
                wesmckinn Wes McKinney
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: