Uploaded image for project: 'DataFu'
  1. DataFu
  2. DATAFU-117

New UDF - CountDistinctUpTo

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 1.3.1
    • None

    Description

      A UDF that counts distinct tuples within a bag, but only up to a preset limit. If the bag contains more distinct tuples than the limit, the UDF returns the limit.

      This UDF can run reasonably well even on large bags if the limit chosen is small enough though the count is done in memory.

      We use this UDF in PayPal for filtering, when we don't need to use the actual tuples afterward.

      Attachments

        1. DATAFU-117.patch
          6 kB
          Eyal Allweil
        2. DATAFU-117-2.patch
          20 kB
          Eyal Allweil
        3. DATAFU-117-3.patch
          19 kB
          Eyal Allweil
        4. DATAFU-117-4.patch
          19 kB
          Eyal Allweil

        Activity

          People

            eyal Eyal Allweil
            eyal Eyal Allweil
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: