Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-32414

pyspark crashes in cluster mode with kafka structured streaming

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.0.0
    • None
    • Structured Streaming
    • None
      • spark version 3.0.0 from mac brew
      • kubernetes Kind 18+
      • kafka cluster: strimzi/kafka:0.18.0-kafka-2.5.0
      • kafka package: org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.0

    Description

      Hello,

      I have been trying to run a pyspark script on Spark on Kubernetes and I have this error that crashed the application:

      java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance of org.apache.spark.rdd.MapPartitionsRDD)

       

      I followed those steps:

      spark-submit --master k8s://https://127.0.0.1:53979 --name spark-pi --deploy-mode cluster --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.0 --conf spark.kubernetes.container.image=fifoosab/pytest:3.0.0.dev0 --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf spark.kubernetes.executor.request.cores=1 --conf spark.kubernetes.driver.request.cores=1 --conf spark.kubernetes.container.image.pullPolicy=Always local:///usr/bin/spark.py

       

      full logs on the error in the attachements

      Attachments

        1. spark.py
          1 kB
          cyrille cazenave
        2. fulllogs.txt
          72 kB
          cyrille cazenave

        Activity

          People

            Unassigned Unassigned
            cyrille.cazenave cyrille cazenave
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: