[ARROW-17759] [R] Implement dplyr::slice_sample() - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: R
Labels:
None

External issue URL:
https://github.com/apache/arrow/issues/20417

Description

slice_sample(.data, ..., n, prop, weight_by = NULL, replace = FALSE)

If n is provided, compute nrow(.data), and if that is not NA, convert to a

{prop}

. (Might want to do prop + .01 or something and then do head after, i.e. sample more than you need and then take n, just so you don't by randomness get fewer than n.)

With prop, turn this into filter(arrow_random() < prop). See ARROW-17572.

Defer weight_by to a followup. It should be doable but might be expensive (need to scan everything to compute sum and ensure that all values are positive).

Defer replace = TRUE.

Also probably can only do if .data is ungrouped, I think the dplyr methods do sampling within groups.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Neal Richardson

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 16/Sep/22 17:18

Updated:: 11/Jan/23 11:55