Details
-
Bug
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
None
-
None
-
None
Description
When converting a memory mapped arrow file into parquet file, it loads the whole table into RAM. This effectively negates the point of memory mapping.
If this is not a bug, perhaps there is a proper way of converting the memorymapped arrow file to parquet without using excessive memory?
Example code:
source = pa.memory_map(path_to_arrow_file, 'r')
table = pa.ipc.RecordBatchFileReader(source).read_all()
# The followlng line will load the whole thing into RAM
pq.write_table(table, path_to_parquet)