Uploaded image for project: 'Crunch (Retired)'
  1. Crunch (Retired)
  2. CRUNCH-118

Add Aggregator and lib functionality for computing the distinct elements of a PCollection

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.5.0
    • MapReduce Patterns
    • None

    Description

      In my continuing effort to add functions that I like from Hive/Pig to Crunch, I'm proposing a) an Aggregator that behaves like Hive's collect_set() UDAF and b) an o.a.c.lib.Distinct library for computing the distinct elements of a PCollection via a groupByKey operation.

      Attachments

        1. CRUNCH-118.patch
          10 kB
          Josh Wills
        2. CRUNCH-118v2.patch
          14 kB
          Josh Wills
        3. CRUNCH-118v3.patch
          19 kB
          Matthias Friedrich

        Activity

          People

            jwills Josh Wills
            jwills Josh Wills
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: