[ARROW-6417] [C++][Parquet] Non-dictionary BinaryArray reads from Parquet format have slowed down since 0.11.x - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.15.0
Component/s: C++, Python
Labels:
- pull-request-available

External issue URL:
https://github.com/apache/arrow/issues/22788

Description

In doing some benchmarking, I have found that binary reads seem to be slower from Arrow 0.11.1 to master branch. It would be a good idea to do some basic profiling to see where we might improve our memory allocation strategy (or whatever the bottleneck turns out to be)

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

20190903_parquet_read_perf.png
03/Sep/19 03:09
12 kB
Wes McKinney
20190903_parquet_benchmark.py
02/Sep/19 17:57
4 kB
Wes McKinney

Issue Links

links to

GitHub Pull Request #5268

GitHub Pull Request #5297

Activity

People

Assignee:: Unassigned

Reporter:: Wes McKinney

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 02/Sep/19 17:59

Updated:: 11/Jan/23 07:47

Resolved:: 14/Oct/19 20:50

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

1.5h