Uploaded image for project: 'DataFu'
  1. DataFu
  2. DATAFU-173

Change UDAFS to use Aggregator instead of UserDefinedAggregateFunction

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 2.0.0
    • None

    Description

      Currently our UDAFs use the UserDefinedAggregateFunction class. There are two drawbacks with this:

      1) It is less efficient than Aggregator

      2) UserDefinedAggregateFunction is deprecated and removed from Spark 3.2.0.

       

      This story is for changing them to use Aggregator.

       

      The UDAFs are located here:

      https://github.com/apache/datafu/blob/main/datafu-spark/src/main/scala/datafu/spark/SparkUDAFs.scala

       

      Here are some links explaining how to do this:

      https://stackoverflow.com/questions/48180598/spark-what-is-the-difference-between-aggregator-and-udaf

      https://stackoverflow.com/questions/66808917/apache-spark-how-to-define-a-userdefinedaggregatefunction-after-3

       

      This change should be backwards compatible if possible; the tests in TestSparkUDAFs should all still pass.

      Attachments

        Activity

          People

            eyal Eyal Allweil
            eyal Eyal Allweil
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 40m
                40m