Uploaded image for project: 'Bigtop'
  1. Bigtop
  2. BIGTOP-669

Add DataFu to Bigtop distribution

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.4.0
    • 0.4.0, 0.5.0
    • general
    • None

    Description

      DataFu is a collection of user-defined functions for working with large-scale data in Hadoop and Pig. This library was born out of the need for a stable, well-tested library of UDFs for data mining and statistics. It is used at LinkedIn in many of our off-line workflows for data derived products like "People You May Know" and "Skills".

      DataFu is available under the Apache License v2 from their GitHub project page: https://github.com/linkedin/datafu

      The latest release of DataFu is: 0.0.4

      Note: this will also open up a possibility for Bigtop to start collecting custom UDF implementations for other projects like Hive, etc. For now, I simply propose and extra package called pig-udf-datafu

      Attachments

        1. BIGTOP-669.patch.txt
          15 kB
          Roman Shaposhnik

        Activity

          People

            rvs Roman Shaposhnik
            rvs Roman Shaposhnik
            Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: