Uploaded image for project: 'DataFu'
  1. DataFu
  2. DATAFU-168

Support Spark 2.4.6 and up - fix collectLimitedList compilation

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.6.1, 1.7.0, 1.8.0
    • 2.0.0
    • None

    Description

      Once DATAFU-167 is merged, datafu-spark will support Spark versions up to 2.4.5. However, because our implementation of collectLimitedList extends Spark's collect, and because its interface was changed in 2.4.6, compilation is broken for us.

       

      Here is the relevant line from collectLimitedList: https://github.com/apache/datafu/blob/master/datafu-spark/src/main/scala/spark/utils/overwrites/SparkOverwriteUDAFs.scala#L104)

      Here is the compilation warning:

      /Users/eyal/git/datafu/datafu-spark/src/main/scala/spark/utils/overwrites/SparkOverwriteUDAFs.scala:104: class CollectLimitedList needs to be abstract, since:
      it has 3 unimplemented members.
      /** As seen from class CollectLimitedList, the missing signatures are as follows.
       *  For convenience, these are usable as stub implementations.
       */
        // Members declared in org.apache.spark.sql.catalyst.expressions.aggregate.Collect
        protected val bufferElementType: org.apache.spark.sql.types.DataType = ???
        protected def convertToBufferElement(value: Any): Any = ???
        // Members declared in org.apache.spark.sql.catalyst.expressions.aggregate.TypedImperativeAggregate
        def eval(buffer: scala.collection.mutable.ArrayBuffer[Any]): Any = ???
      case class CollectLimitedList(child: Expression,
                 ^
      one error found
      FAILURE: Build failed with an exception.
      

       

       

      We need to either 1) update our implementation, and drop support for older versions (and then release this in our version 1.8.0) or 2) copy the code in a backwards compatible way.

      Please note that you can replicate this compilation error on the master branch even without merging DATAFU-167 by running:

      ./gradlew :datafu-spark:test -PscalaVersion=2.11 -PsparkVersion=2.4.6 --tests "DataFrame*"

      Attachments

        1. DATAFU-168.patch
          3 kB
          Yanir

        Issue Links

          Activity

            People

              Unassigned Unassigned
              eyal Eyal Allweil
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: