Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
Description
In my continuing effort to add functions that I like from Hive/Pig to Crunch, I'm proposing a) an Aggregator that behaves like Hive's collect_set() UDAF and b) an o.a.c.lib.Distinct library for computing the distinct elements of a PCollection via a groupByKey operation.