[ARROW-1400] [Python] Ability to create partitions when writing to Parquet - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 0.6.0
Fix Version/s: 0.7.0
Component/s: Python
Labels:
None
Environment:
Mac OS Sierra 10.12.6

External issue URL:
https://github.com/apache/arrow/issues/15457

Description

I'm fairly new to pyarrow so I apologize if this is already a feature, but I couldn't find a solution in the documentation nor an existing issue. Basically I'm trying to export pandas dataframes to .parquet files with partitions. I can see that pyarrow.parquet has a way of reading .parquet files with partitions, but there's no indication that it can write with partitions. E.g., it would be nice if there was a parameter in pyarrow.Table.write_table() that took a list of columns to partition the table similar to the pyspark implementation: spark.write.parquet's "partitionBy" parameter.

Referenced links:
https://arrow.apache.org/docs/python/parquet.html
https://arrow.apache.org/docs/python/parquet.html?highlight=pyarrow%20parquet%20partition

Attachments

Activity

People

Assignee:: Safyre Anderson

Reporter:: Safyre Anderson

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 23/Aug/17 01:02

Updated:: 11/Jan/23 07:14

Resolved:: 04/Sep/17 02:37