[ARROW-9523] [Rust] improve performance of filter kernel - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 0.17.1
Fix Version/s: 2.0.0
Component/s: Rust
Labels:
- pull-request-available

External issue URL:
https://github.com/apache/arrow/issues/25591

Description

The filter kernel located here https://github.com/apache/arrow/blob/master/rust/arrow/src/compute/kernels/filter.rs

currently has the following performance:

filter old u8 low selectivity time: [1.7782 ms 1.7790 ms 1.7801 ms]
filter old u8 high selectivity time: [815.58 us 816.58 us 817.57 us]
filter old u8 w NULLs low selectivity time: [1.8131 ms 1.8231 ms 1.8336 ms]
filter old u8 w NULLs high selectivity time: [817.41 us 820.01 us 823.05 us]

I have been working on a new implementation which performs between approximately 14 and 480 times faster depending mostly on filter selectivity. Here are the benchmark results:

filter u8 low selectivity time: [127.30 us 128.06 us 128.88 us]
filter u8 high selectivity time: [5.4215 us 5.5778 us 5.7335 us]
filter context u8 low selectivity time: [124.21 us 126.21 us 128.38 us]
filter context u8 high selectivity time: [1.6707 us 1.7052 us 1.7476 us]
filter context u8 w NULLs low selectivity time: [142.40 us 142.83 us 143.37 us]
filter context u8 w NULLs high selectivity time: [2.3338 us 2.3788 us 2.4304 us]
filter context f32 low selectivity time: [132.59 us 132.91 us 133.29 us]
filter context f32 high selectivity time: [1.6864 us 1.7026 us 1.7212 us]

This new implementation is based on a few key ideas:

(1) if the data array being filtered doesn't have a null bitmap, no time should be wasted to copy or create a null bitmap in the resulting filtered data array - this is implemented using the CopyNullBit trait which has a no-op implementation and an actual implementation

(2) when the filter is highly selective, e.g. only a small number of values from the data array are selected, the filter implementation should efficiently skip entire batches of 0s in the filter array - this is implemented by transmuting the filter array to u64 which allows to quickly check and skip entire batches of 64 bits

(3) when an entire record batch is filtered, any computation which only depends on the filter array is done once and then shared for filtering all the data arrays in the record batch - this is implemented using the FilterContext struct

paddyhoran, andygrove let me know what you think

Attachments

Issue Links

links to

GitHub Pull Request #7798

Activity

People

Assignee:: Unassigned

Reporter:: Yordan Pavlov

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 19/Jul/20 11:26

Updated:: 11/Jan/23 08:07

Resolved:: 06/Aug/20 14:44

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

3h 20m