XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • R
    • None

    Description

      slice_sample(.data, ..., n, prop, weight_by = NULL, replace = FALSE)
      

      If n is provided, compute nrow(.data), and if that is not NA, convert to a

      {prop}

      . (Might want to do prop + .01 or something and then do head after, i.e. sample more than you need and then take n, just so you don't by randomness get fewer than n.)

      With prop, turn this into filter(arrow_random() < prop). See ARROW-17572.

      Defer weight_by to a followup. It should be doable but might be expensive (need to scan everything to compute sum and ensure that all values are positive).

      Defer replace = TRUE.

      Also probably can only do if .data is ungrouped, I think the dplyr methods do sampling within groups.

      Attachments

        Activity

          People

            Unassigned Unassigned
            npr Neal Richardson
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: