Uploaded image for project: 'Zeppelin'
  1. Zeppelin
  2. ZEPPELIN-2156

Paragraph with PySpark streaming - running job cannot be canceled

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.7.0
    • None
    • pySpark
    • Linux Ubuntu

    Description

      In a 'spark.pyspark' paragraph a StreamingContext to a Kafka stream is created.

      The paragraph is started and while the job is running the spark context produces correct output from the code.

      The problem is the job cannot be stopped in the Zeppelin web interface.

      Installed Kafka version: kafka_2.11-0.8.2.2
      Spark Kafka jar: spark-streaming-kafka-0-8_2.11-2.1.0.jar
      Zeppelin: zeppelin-0.7.0-bin-all

      Tried:
      1. Paragraph Cancel ( || button ) has no effect.

      2. Zeppelin Job view Stop All has no effect

      3. Another paragraph with
      %spark.pyspark
      ssc.stop(stopSparkContext=false, stopGracefully=true)

      is started by stays in 'Pending'

      4. Restarting the 'spark' interpreter stops the job

      The example logic:

      %spark.pyspark
      import sys
      import json

      from pyspark import SparkContext
      from pyspark.streaming import StreamingContext(II())
      from pyspark.streaming.kafka import KafkaUtils

      zkQuorum, topic, interval = ('localhost:2181', 'airport', 60)

      ssc = StreamingContext(sc, interval)

      kvs = KafkaUtils.createStream(ssc, zkQuorum, "spark-streaming-consumer",

      {topic: 1}

      )

      parsed = kvs.map(lambda (k, v): json.loads(v))
      summed = parsed.\
      filter(lambda event: 'kind' in event and event['kind']=='gate').\
      map(lambda event: ('count_all', int(event['value']['passengers']))).\
      reduceByKey(lambda x,y: x + y).\
      map(lambda x:

      {'sum': x[0], "passengers": x[1]}

      )
      summed.pprint()

      ssc.start()
      ssc.awaitTermination()

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            PiotrNestor Piotr Nestorow

            Dates

              Created:
              Updated:

              Slack

                Issue deployment