Hadoop Abacus package is a specialization of map/reduce framework,
specilizing for performing various counting and aggregations.
It offers similar functionalities to Google's SawZall.
Generally speaking, in order to implement an application using Map/Reduce model,
the developer needs to implement Map and Reduce functions (and possibly Combine function).
However, for a lot of applications related to counting and statistics computing,
these functions have very similar characteristics.
Abacus abstracts out the general patterns and provides a package implementing those patterns.
In particular, the package provides a generic mapper class, a reducer class and a combiner class,
and a set of built-in value aggregators. It also provides a generic utility class, ValueAggregatorJob
for creating Abacus jobs.
To create an Abacus job, the user just needs to implement one plugin class that
is responsible for specifying what aggregators to use and what values are for which aggregators.
The mapper will call this class in the runtime to generate aggregation ids and values.
The generic combiner and reducer will aggregate the values associated with the same
aggregation ids accordingly. Thus, it is much easier to create and run an Abacus job than
a normal map/reduce job. Since a built-in generic combiner is always used, the execution is very efficient.
- relates to
HADOOP-1547 Provide examples for aggregate library