Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-4040

Update spark documentation for local mode and spark-streaming.

    XMLWordPrintableJSON

    Details

    • Type: Documentation
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.2.0
    • Component/s: Documentation
    • Labels:
      None

      Description

      Note: this JIRA has changed since its inception - its not a bug, but something which can be tricky to surmise from existing docs. So the attached patch is a doc improvement.

      Below is the original JIRA which was filed:

      Please note that Im somewhat new to spark streaming's API, and am not a spark expert - so I've done the best to write up and reproduce this "bug". If its not a bug i hope an expert will help to explain why and promptly close it. However, it appears it could be a bug after discussing with R J Nowling who is a spark contributor.

      CC R J Nowling William Benton

      It appears that in a DStream context, a call to MappedRDD.count() blocks progress and prevents emission of RDDs from a stream.

          tweetStream.foreachRDD((rdd,lent)=> {
            tweetStream.repartition(1)
            //val count = rdd.count()  DONT DO THIS !
            checks += 1;
            if (checks > 20) {
              ssc.stop()
            }
         }
      

      The above code block should inevitably halt, after 20 intervals of RDDs... However, if we uncomment the call to rdd.count(), it turns out that we get an infinite stream which emits no RDDs , and thus our program runs forever (ssc.stop is unreachable), because forEach doesnt receive any more entries.

      I suspect this is actually because the foreach block never completes, because count() is winds up calling compute, which ultimately just reads from the stream.

      I havent put together a minimal reproducer or unit test yet, but I can work on doing so if more info is needed.

      I guess this could be seen as an application bug - but i think spark might be made smarter to throw its hands up when people execute blocking code in a stream processor.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                jayunit100 jay vyas
                Reporter:
                jayunit100 jay vyas
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: