Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.15.0
-
python3.7
Description
Hi,
I just noticed that reading a parquet file becomes really slow after I upgraded to 0.15.0 when using pandas.
Example:
With 0.14.1
In [4]: %timeit df = pd.read_parquet(path)
2.02 s ± 47.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
With 0.15.0
In [5]: %timeit df = pd.read_parquet(path)
22.9 s ± 478 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
The file is about 15MB in size. I am testing on the same machine using the same version of python and pandas.
Have you received similar complain? What could be the issue here?
Thanks a lot.
Edit1:
Some profiling I did:
0.14.1:
0.15.0:
Attachments
Attachments
Issue Links
- is related to
-
ARROW-7059 [Python] Reading parquet file with many columns is much slower in 0.15.x versus 0.14.x
- Resolved
- links to