Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
6.0.0
Description
from io import BytesIO import pandas as pd import pyarrow from pyarrow import parquet from pyarrow import fs print(pyarrow._version_) def check_row_groups_created(size: int): df = pd.DataFrame({"a": range(size)}) t = pyarrow.Table.from_pandas(df) buffer = BytesIO() parquet.write_table(t, buffer, row_group_size=size) buffer.seek(0) print(parquet.read_metadata(buffer)) check_row_groups_created(50_000_000) check_row_groups_created(100_000_000)
outputs:
6.0.0 <pyarrow._parquet.FileMetaData object at 0x7f838584ab80> created_by: parquet-cpp-arrow version 6.0.0 num_columns: 1 num_rows: 50000000 num_row_groups: 1 format_version: 1.0 serialized_size: 1493 <pyarrow._parquet.FileMetaData object at 0x7f838584ab80> created_by: parquet-cpp-arrow version 6.0.0 num_columns: 1 num_rows: 100000000 num_row_groups: 2 format_version: 1.0 serialized_size: 1640
Attachments
Issue Links
- links to