Uploaded image for project: 'Zeppelin'
  1. Zeppelin
  2. ZEPPELIN-2156

Paragraph with PySpark streaming - running job cannot be canceled

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 0.7.0
    • Fix Version/s: None
    • Component/s: pySpark
    • Environment:

      Linux Ubuntu

      Description

      In a 'spark.pyspark' paragraph a StreamingContext to a Kafka stream is created.

      The paragraph is started and while the job is running the spark context produces correct output from the code.

      The problem is the job cannot be stopped in the Zeppelin web interface.

      Installed Kafka version: kafka_2.11-0.8.2.2
      Spark Kafka jar: spark-streaming-kafka-0-8_2.11-2.1.0.jar
      Zeppelin: zeppelin-0.7.0-bin-all

      Tried:
      1. Paragraph Cancel ( || button ) has no effect.

      2. Zeppelin Job view Stop All has no effect

      3. Another paragraph with
      %spark.pyspark
      ssc.stop(stopSparkContext=false, stopGracefully=true)

      is started by stays in 'Pending'

      4. Restarting the 'spark' interpreter stops the job

      The example logic:

      %spark.pyspark
      import sys
      import json

      from pyspark import SparkContext
      from pyspark.streaming import StreamingContext(II())
      from pyspark.streaming.kafka import KafkaUtils

      zkQuorum, topic, interval = ('localhost:2181', 'airport', 60)

      ssc = StreamingContext(sc, interval)

      kvs = KafkaUtils.createStream(ssc, zkQuorum, "spark-streaming-consumer",

      {topic: 1}

      )

      parsed = kvs.map(lambda (k, v): json.loads(v))
      summed = parsed.\
      filter(lambda event: 'kind' in event and event['kind']=='gate').\
      map(lambda event: ('count_all', int(event['value']['passengers']))).\
      reduceByKey(lambda x,y: x + y).\
      map(lambda x:

      {'sum': x[0], "passengers": x[1]}

      )
      summed.pprint()

      ssc.start()
      ssc.awaitTermination()

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              PiotrNestor Piotr Nestorow
            • Votes:
              2 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated: