Uploaded image for project: 'Bigtop'
  1. Bigtop
  2. BIGTOP-669

Add DataFu to Bigtop distribution

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.4.0
    • Fix Version/s: 0.4.0, 0.5.0
    • Component/s: general
    • Labels:
      None

      Description

      DataFu is a collection of user-defined functions for working with large-scale data in Hadoop and Pig. This library was born out of the need for a stable, well-tested library of UDFs for data mining and statistics. It is used at LinkedIn in many of our off-line workflows for data derived products like "People You May Know" and "Skills".

      DataFu is available under the Apache License v2 from their GitHub project page: https://github.com/linkedin/datafu

      The latest release of DataFu is: 0.0.4

      Note: this will also open up a possibility for Bigtop to start collecting custom UDF implementations for other projects like Hive, etc. For now, I simply propose and extra package called pig-udf-datafu

        Attachments

        1. BIGTOP-669.patch.txt
          15 kB
          Roman Shaposhnik

          Activity

            People

            • Assignee:
              rvs Roman Shaposhnik
              Reporter:
              rvs Roman Shaposhnik
            • Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: