Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-4040

Update spark documentation for local mode and spark-streaming.

    XMLWordPrintableJSON

Details

    • Documentation
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.2.0
    • Documentation
    • None

    Description

      Note: this JIRA has changed since its inception - its not a bug, but something which can be tricky to surmise from existing docs. So the attached patch is a doc improvement.

      Below is the original JIRA which was filed:

      Please note that Im somewhat new to spark streaming's API, and am not a spark expert - so I've done the best to write up and reproduce this "bug". If its not a bug i hope an expert will help to explain why and promptly close it. However, it appears it could be a bug after discussing with rnowling who is a spark contributor.

      CC rnowling willbenton

      It appears that in a DStream context, a call to MappedRDD.count() blocks progress and prevents emission of RDDs from a stream.

          tweetStream.foreachRDD((rdd,lent)=> {
            tweetStream.repartition(1)
            //val count = rdd.count()  DONT DO THIS !
            checks += 1;
            if (checks > 20) {
              ssc.stop()
            }
         }
      

      The above code block should inevitably halt, after 20 intervals of RDDs... However, if we uncomment the call to rdd.count(), it turns out that we get an infinite stream which emits no RDDs , and thus our program runs forever (ssc.stop is unreachable), because forEach doesnt receive any more entries.

      I suspect this is actually because the foreach block never completes, because count() is winds up calling compute, which ultimately just reads from the stream.

      I havent put together a minimal reproducer or unit test yet, but I can work on doing so if more info is needed.

      I guess this could be seen as an application bug - but i think spark might be made smarter to throw its hands up when people execute blocking code in a stream processor.

      Attachments

        Issue Links

          Activity

            People

              jayunit100 jay vyas
              jayunit100 jay vyas
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: