Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-31704

PandasUDFType.GROUPED_AGG with Java 11

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Duplicate
    • 3.0.0
    • None
    • PySpark
    • java jdk: 11

      python: 3.7

       

    Description

      Running the example from the docs gives an error with java 11. It works with java 8.

      import findspark
      findspark.init('/usr/local/lib/spark-3.0.0-preview2-bin-hadoop2.7')
      from pyspark.sql.functions import pandas_udf, PandasUDFType
      from pyspark.sql import Window
      from pyspark.sql import SparkSession
      
      if __name__ == '__main__':
          spark = SparkSession \
              .builder \
              .appName('test') \
              .getOrCreate()
      
          df = spark.createDataFrame(
              [(1, 1.0), (1, 2.0), (2, 3.0), (2, 5.0), (2, 10.0)],
              ("id", "v"))
      
          @pandas_udf("double", PandasUDFType.GROUPED_AGG)
          def mean_udf(v):
              return v.mean()
      
          w = (Window.partitionBy('id')
               .orderBy('v')
               .rowsBetween(-1, 0))
          df.withColumn('mean_v', mean_udf(df['v']).over(w)).show()
      
      File "/usr/local/lib/spark-3.0.0-preview2-bin-hadoop2.7/python/lib/py4j-0.10.8.1-src.zip/py4j/protocol.py", line 328, in get_return_value
      py4j.protocol.Py4JJavaError: An error occurred while calling o81.showString.
      : org.apache.spark.SparkException: Job aborted due to stage failure: Task 44 in stage 7.0 failed 1 times, most recent failure: Lost task 44.0 in stage 7.0 (TID 37, 131.130.32.15, executor driver): java.lang.UnsupportedOperationException: sun.misc.Unsafe or java.nio.DirectByteBuffer.<init>(long, int) not available
      	at io.netty.util.internal.PlatformDependent.directBuffer(PlatformDependent.java:473)
      	at io.netty.buffer.NettyArrowBuf.getDirectBuffer(NettyArrowBuf.java:243)
      	at io.netty.buffer.NettyArrowBuf.nioBuffer(NettyArrowBuf.java:233)
      	at io.netty.buffer.ArrowBuf.nioBuffer(ArrowBuf.java:245)
      	at org.apache.arrow.vector.ipc.message.ArrowRecordBatch.computeBodyLength(ArrowRecordBatch.java:222)
      	at org.apache.arrow.vector.ipc.message.MessageSerializer.serialize(MessageSerializer.java:240)
      	at org.apache.arrow.vector.ipc.ArrowWriter.writeRecordBatch(ArrowWriter.java:132)
      	at org.apache.arrow.vector.ipc.ArrowWriter.writeBatch(ArrowWriter.java:120)
      	at org.apache.spark.sql.execution.python.ArrowPythonRunner$$anon$1.$anonfun$writeIteratorToStream$1(ArrowPythonRunner.scala:94)
      	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
      	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
      	at org.apache.spark.sql.execution.python.ArrowPythonRunner$$anon$1.writeIteratorToStream(ArrowPythonRunner.scala:101)
      	at org.apache.spark.api.python.BasePythonRunner$WriterThread.$anonfun$run$1(PythonRunner.scala:373)
      	at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1932)
      	at org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:213)
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              _tretzi Markus Tretzmüller
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: