[ARROW-6411] [C++][Parquet] DictEncoderImpl<T>::PutIndicesTyped has bad performance on some systems - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.15.0
Component/s: C++
Labels:
- pull-request-available

External issue URL:
https://github.com/apache/arrow/issues/22783

Description

I was doing some benchmarking and noticed that this function showed up as slow due to __memmove_avx_unaligned_erms. I'm interested to investigate why this is, but for me it's fixed by changing the std::vector::reserve call to std::vector::resize and instead assigning elements into buffered_indices_. I'll add a Python benchmark that illustrates the problem to see if it shows up on other systems

Attachments

Issue Links

is related to

PARQUET-1646 [C++] Use arrow::Buffer for buffered dictionary indices in DictEncoder instead of std::vector

Open

links to

GitHub Pull Request #5248

Activity

People

Assignee:: Wes McKinney

Reporter:: Wes McKinney

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 01/Sep/19 23:14

Updated:: 11/Jan/23 07:46

Resolved:: 03/Sep/19 02:40

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

1h 20m