Uploaded image for project: 'DataFu'
  1. DataFu
  2. DATAFU-5

Update SimpleRandomSample (SRS) to be consistent with SimpleRandomSampleWithReplacement (SRSWR)

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.3.0
    • Labels:
      None

      Description

      In the current implementation, SRS takes the sampling probability in the constructor of the UDF, while SRSWR takes the sample size in the function call. The attached patch updates SRS to make it consistent with SRSWR.

      After the patch, SRS takes a bag of items, a desired sampling probability, and optionally a lower bound of the size of the population as the inputs, while SRSWR takes a bag of items, a desired sample size, and optionally a lower bound of the size of the population as the inputs.

      Another benefit of the patch is that user doesn't have to create multiple instances of the UDF to sample with different probabilities.

        Attachments

        1. DATAFU-5.patch
          18 kB
          Xiangrui Meng
        2. DATAFU-5.patch
          29 kB
          Xiangrui Meng
        3. 0001-update-SimpleRandomSample-to-be-consistent-with-Simp.patch
          27 kB
          Xiangrui Meng
        4. DATAFU-5.patch
          28 kB
          Matthew Hayes

          Activity

            People

            • Assignee:
              mengxr Xiangrui Meng
              Reporter:
              mengxr Xiangrui Meng
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: