DataFu is a collection of user-defined functions for working with large-scale data in Hadoop and Pig. This library was born out of the need for a stable, well-tested library of UDFs for data mining and statistics. It is used at LinkedIn in many of our off-line workflows for data derived products like "People You May Know" and "Skills".
DataFu is available under the Apache License v2 from their GitHub project page: https://github.com/linkedin/datafu
The latest release of DataFu is: 0.0.4
Note: this will also open up a possibility for Bigtop to start collecting custom UDF implementations for other projects like Hive, etc. For now, I simply propose and extra package called pig-udf-datafu