[ARROW-8907] [Rust] implement scalar comparison operations - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.0.0
Component/s: Rust
Labels:
- pull-request-available

External issue URL:
https://github.com/apache/arrow/issues/25040

Description

Currently comparing an array to a scalar / literal value using the comparison operations defined in the comparison kernel here:
https://github.com/apache/arrow/blob/master/rust/arrow/src/compute/kernels/comparison.rs
is very inefficient because:
(1) an array with the scalar value repeated has to be created, taking time and wasting memory
(2) time is spent during comparison to load the same literal values over and over

Initial benchmarking of a specialized scalar comparison function indicates good performance gains:

eq Float32 time: [938.54 us 950.28 us 962.65 us]
eq scalar Float32 time: [836.47 us 838.47 us 840.78 us]
eq Float32 simd time: [75.836 us 76.389 us 77.185 us]
eq scalar Float32 simd time: [61.551 us 61.605 us 61.671 us]

The benchmark results above show that the scalar comparison function is about 12% faster for non-SIMD and about 20% faster for SIMD comparison operations.
And this is before accounting for creating the literal array.
In a more complex benchmark, the scalar comparison version is about 40% faster overall when we account for not having to create arrays of scalar / literal values.
Here are the benchmark results:

filter/filter with arrow SIMD (array) time: [647.77 us 675.12 us 706.69 us]
filter/filter with arrow SIMD (scalar) time: [402.19 us 404.23 us 407.22 us]

And here is the code for the benchmark:
https://github.com/yordan-pavlov/arrow-benchmark/blob/master/rust/arrow_benchmark/src/main.rs#L230

My only concern is that I can't see an easy way to use scalar comparison operations in Data Fusion as it is currently designed to only work on arrays.

paddyhoran andygrove let me know what you think, would there be value in implementing scalar comparison operations?

Attachments

Issue Links

is related to

ARROW-10173 [Rust][DataFusion] Improve performance of equality to a constant predicate support

Resolved

links to

GitHub Pull Request #7261

Activity

People

Assignee:: Unassigned

Reporter:: Yordan Pavlov

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 23/May/20 10:58

Updated:: 11/Jan/23 08:03

Resolved:: 03/Jun/20 12:34

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

50m