Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
3.0.0
-
None
-
None
-
- spark version 3.0.0 from mac brew
- kubernetes Kind 18+
- kafka cluster: strimzi/kafka:0.18.0-kafka-2.5.0
- kafka package: org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.0
Description
Hello,
I have been trying to run a pyspark script on Spark on Kubernetes and I have this error that crashed the application:
java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance of org.apache.spark.rdd.MapPartitionsRDD)
I followed those steps:
- for spark on kubernetes: https://spark.apache.org/docs/latest/running-on-kubernetes.html (that include building the image using docker-image-tool.sh on mac with -p flag)
- Tried to use the image by the dev on GoogleCloudPlatform/spark-on-k8s-operator (gcr.io/spark-operator/spark-py:v3.0.0) and have the same issue
- for kafka streaming: https://spark.apache.org/docs/3.0.0/structured-streaming-kafka-integration.html#deploying
- When running the script manually in a jupyter notebook (jupyter/pyspark-notebook:latest, version 3.0.0) in local mode (with PYSPARK_SUBMIT_ARGS=--packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.0 pyspark-shell) it ran without issue
- the command ran from the laptop is:
spark-submit --master k8s://https://127.0.0.1:53979 --name spark-pi --deploy-mode cluster --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.0 --conf spark.kubernetes.container.image=fifoosab/pytest:3.0.0.dev0 --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf spark.kubernetes.executor.request.cores=1 --conf spark.kubernetes.driver.request.cores=1 --conf spark.kubernetes.container.image.pullPolicy=Always local:///usr/bin/spark.py
full logs on the error in the attachements