Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
None
Description
The python API and documentation regarding chunksizes is confusing in my opinion.
Example:
def write_table(self, Table table, chunksize=None):
"""
Write RecordBatch to stream
Parameters
----------
batch : RecordBatch
This suggests, the file will be written with a fixed chunk size when in fact the chunksize parameter is an upper bound on the size of the chunks to be written.
In my opinion this parameter should be renamed max_chunksize to avoid confusion and reflect its true purpose.
This would also improve naming consistency in the code base, since in the C++ implementation this parameter is already named max_chunksize in cpp/source/arrow/ipc/writer.cc:
Status RecordBatchWriter::WriteTable(const Table& table, int64_t max_chunksize)
Similarly, the parameter should be renamed in pyarrow.Table.to_batches(self, chunksize=None).
Attachments
Issue Links
- links to