[ARROW-10423] [C++] Filter compute function seems slow compared to numpy nonzero + take - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Done
Affects Version/s: None
Fix Version/s: None
Component/s: C++
Labels:
- compute

External issue URL:
https://github.com/apache/arrow/issues/18346

Description

From https://stackoverflow.com/questions/64581590/is-there-a-more-efficient-way-to-select-rows-from-a-pyarrow-table-based-on-conte

I made a smaller, simplified example:

arr = pa.array(np.random.randn(1_000_000))

# mask with only few True values
mask1 = np.zeros(len(arr), dtype=bool)
mask1[np.random.randint(len(arr), size=100)] = True
mask1_pa = pa.array(mask1)

# mask with larger proportion of True values
mask2 = np.zeros(len(arr), dtype=bool)
mask2[np.random.randint(len(arr), size=10_000)] = True
mask2_pa = pa.array(mask2)

Doing timings of doing a Arrow Filter kernel vs using numpy to convert the mask into indices and then using a Take kernel:

# mask 1
In [3]: %timeit arr.filter(mask1_pa)
132 µs ± 4.44 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [4]: %%timeit
   ...: indices = np.nonzero(mask1)[0]
   ...: arr.take(indices)
114 µs ± 2.62 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

# mask 2
In [8]: %timeit arr.filter(mask2_pa)
711 µs ± 63.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [9]: %%timeit
   ...: indices = np.nonzero(mask2)[0]
   ...: arr.take(indices)
333 µs ± 6.32 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

So in the first case, both are quite similar in timing. But in the second case, the numpy+take version is faster.

I know this might depend on a lot on the actual proportion of True values and how they are positioned in the array (random vs concentrated) etc, so there is probably not a general rule of what should be faster.
But, it still seems a potential indication that things can be optimized in the Filter kernel.

Attachments

Issue Links

is fixed by

ARROW-10696 [C++] Investigate a bit run reader that would only return runs of set bits

Resolved

is related to

ARROW-10474 [C++] Implement vector kernel that transforms a boolean mask into selection indices

Open

Activity

People

Assignee:: Unassigned

Reporter:: Joris Van den Bossche

Votes:: 1 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 29/Oct/20 20:09

Updated:: 11/Jan/23 08:13

Resolved:: 10/Jun/21 14:38