Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.11.1
Description
Currently to write files incrementally to S3, the following pattern appears necessary:
def write_dfs_to_s3(dfs, fname):
first_df = dfs[0]
table = pa.Table.from_pandas(first_df, preserve_index=False)
fs = s3fs.S3FileSystem()
fh = fs.open(fname, 'wb')
with pq.ParquetWriter(fh, table.schema) as writer:
# set file handle on writer so writer manages closing it when it is itself closed
writer.file_handle = fh
writer.write_table(table=table)
for df in dfs[1:]:
table = pa.Table.from_pandas(df, preserve_index=False)
writer.write_table(table=table)
This works as expected, but is quite roundabout. It would be much easier if `ParquetWriter` supported `filesystem` as a keyword argument in its constructor, in which case the `_get_fs_from_path` would be overriden by the usual pattern of using the kwarg after ensuring it is a proper file system with `_ensure_filesystem`.
Attachments
Issue Links
- links to