Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.4.0
    • Fix Version/s: 0.4.0, 0.5.0
    • Component/s: General
    • Labels:
      None

      Description

      DataFu is a collection of user-defined functions for working with large-scale data in Hadoop and Pig. This library was born out of the need for a stable, well-tested library of UDFs for data mining and statistics. It is used at LinkedIn in many of our off-line workflows for data derived products like "People You May Know" and "Skills".

      DataFu is available under the Apache License v2 from their GitHub project page: https://github.com/linkedin/datafu

      The latest release of DataFu is: 0.0.4

      Note: this will also open up a possibility for Bigtop to start collecting custom UDF implementations for other projects like Hive, etc. For now, I simply propose and extra package called pig-udf-datafu

      1. BIGTOP-669.patch.txt
        15 kB
        Roman Shaposhnik

        Activity

        Hide
        Andrew Bayer added a comment -

        +1

        Show
        Andrew Bayer added a comment - +1

          People

          • Assignee:
            Roman Shaposhnik
            Reporter:
            Roman Shaposhnik
          • Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development