Details
-
Bug
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
1.0.0, 1.0.1, 2.0.0, 3.0.0
-
None
-
None
Description
I noticed a relatively big performance degradation in version 1.0.0+ when trying to load wide dataframes.
For example you should be able to reproduce by doing:
import numpy as np import pandas as pd import pyarrow as pa import pyarrow.parquet as pq df = pd.DataFrame(np.random.rand(100, 10000)) table = pa.Table.from_pandas(df) pq.write_table(table, "temp.parquet") %timeit pd.read_parquet("temp.parquet")
In version 0.17.0, this takes about 300-400 ms and for anything above and including 1.0.0, this suddenly takes around 2 seconds.
Thanks for looking into this.
Attachments
Attachments
Issue Links
- is related to
-
ARROW-12736 [C++] Eliminate unnecessary copy in FieldPath::Get()
- Resolved